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Query: 368 HLDLDIKEGEKIAILGRSGSGKSTLASLLRGDLKASQGEITLGDADVSIVGDCISNYIGV 427 

+ +++GEK+A+LGRSGSGKST +L+ G IiK G +TL + +++ D I++ + V 
Sbjct: 355 NFSFTLRQGEKMALLGRSGSGKSTSLMjIEGALKPDSGSVTLNGVETAIiLKDQIADAVAV 414 

5 Query: 428 IQQAPYLFlWTLLNNIRIGNQDASEEDVW»/LERVGLKE^WTDLSDGljYT^WDEAGLRFS 487 

4 Q P+LF+T++LNNIR+GN +AS+EDV + ++V L + + L DG +T V E G+RFS 
Sbjct: 415 LNQKPHLFDTSILNNIRLGNGEASDEDVRRAAKQVS<LHDYIESLPDGYHTSVQETGIRFS 474 

Query: 488 GGERHRIALARILLKDVPIVILCEPTOGLDPITEQALLRVFMKELEGKTLVWITHHLKBI 54 7 
10 GGER RIALARILL+D PI+ILDEPTVGLDPITE+ L+ + L+GKT++WITHHL G+ 

Sbjct: 475 GGERQRIAIARILLQDTPIIILDEPTVGLDPITERELMETVFEVLKGKTILWITHHLAGV 534 

Query: 548 EHADRILFIENGQLELEGSPQELSQSSQRYRQL 580 
E AD+I+F+ENG+ E+EG+ +EL +++RYR+L 
15 Sbjct: 535 EAADKIVFLENGKTEMEGTHEELLAANERYRRL 567 



A related GBS gene <SEQ ID 886 1> and protein <SEQ ID 8862> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -15.90 
GvH: Signal Score (-7.5): 1.97 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 7 value: -12.84 threshold: 0.0 
Likelihood =-12. 
Likelihood = -9. 
Likelihood = -6 
Likelihood = -6 
Likelihood = -3 
Likelihood = -1 
Likelihood = -0 
Likelihood = 3 
modified ALOM score: 3.07 

*** Reasoning Step: 3 



INTEGRAIi 
INTEGRAL 
, INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



260 - 276 ( 258 - 

172 - 188 ( 147 - 

150 - 166 ( 147 • 

31 - 47 ( 29 - 

68 - 84 ( 67 - 

293 - 309 ( 292 - 

494 - 510 ( 493 • 



Final Results 

bacterial membrane --- Certainty=0. 6137 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

40 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF00997{346 - 2052 of 2364) 

EGAD|98910|BS3866(1 - 571 of 575) transport ATP-binding protein cydd {Bacillus subtilis} 
45 OMNI |NT01BS4517 ABC transporter CydC, putative SP | P94367 | CYDD_BACSU TRANSPORT ATP-BINDING 

PROTEIN CYDD. GP | 1783253 | dbj |BAA11730 . 1 [ |D83026 homologous to many ATP-binding transport 
proteins; hypothetical {Bacillus subtilis} GP | 2636408 | emb| CAB15899 . 1 | | 299123 ABC membrane 
transporter (ATP-binding protein) {Bacillus subtilis} PIR|D69611 |D69611 ABC transporter 
required for expression of cytochrome bd (ATP- ) cydD - Bacillus subtilis 
50 %Match =31.9 

%Identity = 45.2 %Similarity = 69.1 

Matches = 257 Mismatches = 172 Conservative Sub.s =13 6 



LKKDISIN*SMLWEEMMFKIPLFKELKTDQWIKPFFKQYKVSLVIALFLGFMTFFSASALMFNSGYLISKSASLPSNILL 

=1 = = ll 1 = II = 1 = =111 = 1 III: III 11=1111=1= I Mil 

MKKEEWILPYIKQNARLFVLVIFL3AVTIFSAAFLMFTSGFLISKAATRPENILL 



60 540 570 600 630 660 690 720 750 

vWPIVLTRAFGIGRPVFRYIERLTSHI>JWVLRMrSQI^LKLra^ 

= ||||| | III I I 11=111 1= =l = = I =l = = ll= II 1= = = II 11-1 = 1 = 111 =11= =1 = 11 
IYVPIVAVRTFGIARSVSRYVERLVGHHIILKIVSDMRVRLYNMLEPGALMLRSRFRTGDMLGILSEDIEHLQDAFLKTI 
70 80 90 100 110 120 130 

65 
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FPTIIAWILYSFIIIATGFFSLWFALMMLLYLAIK IFL \L,, C ^t^JIGARQTREKELKNHLYTDLTDNVLGISDWIFSQR 
II I I =lh >|| llll 11::= III = = = 111= 1=1 h = I =1 11= 111 1=1=111=11 I 
FPAISALLLYAVSVIftLGFFSWPFAILLALYLFVLVVLFPWSLL^RAKNAKLKSGRNVLYSRLTDAVMGVSDt'MFSGR 
5 150 160 170 180 190 200 210 

1020 1050 1080 1110 1140 1194 1224 

GQEYVALHERSESELMAVQKKIRSFDNRRALIVELVFGFIiAILVIIWASNQFIGHRGGE--ANWIAAFVLTVFPLSFAFA 
: :: ,|, | : :„| : | | : : : | :|.. | : | : || |||||| ||II = MI 

1 0 RHAFIDAYEKEERDWFELERKKQRFTRWRDFAAQCLVAGLILLMLFWTAGQ- - -QADGEL&KTMIAAFVLWFPLTEAFL 

230 240 250 260 270 280 290 

1254 1284 1302 1332 1362 1392 1422 1452 

GLSAAAQETNKYSDSIHRLNELS ETYFETTQNQLPNKPYDFSVKNLSFQYKPQEKWVLHHLDLDIKEGEKIAILGR 

15 II I I I III |=| =: : |: | : :: :,::| | |||:: : :::|||:|:||| 

PLSDALGEVPGYQDSIRR^mOTAPQPEMQTESGDQILDLQDVTLAFRDVTFSY-DNSSQVLHNFSFTLRQGEKMALLGR 
310 320 330 340 350 360 370 

1482 1512 1542 1572 1602 1632 1662 1692 

20 SGSGKSTI^SLLRGDLICASQGEITLGDADVSIVGDCIS^IGVIQQAPYXFOT'TLLNTFRIGNQDASEEDWIOTjERVGL 

mini =1= i ii i =n = = = = i n= = i= i i= i = i = = n mi =imn = = = i i 

SGSGKSTSJ^IEGALKPDSGSVTLNGVETALLroQIADAVAVI^^ 

390 400 410 420 430 440 450 

25 1722 1752 1782 1812 1842 1872 1902 1932 

KEMVTDLSDGLYTMVDEAGLRFSGGERHRIAIARILLKDVPIVILDEPOTGLDPITEQALLRVFMKELEGKTLVWITHHL 

= = i ii =i m mniiimiiiinim imiiiimimm i= = mmmnn 

HDYIESLPDGYHTSVQETGIRFSGGERQRIALARILLQDTPIIILDEPTVGLDPITERELMETVFEVLKGKTILWITHHL 
470 480 490 500 510 520 530 

30 

1962 1992 2022 2052 2082 2112 2142 2172 

KGIEHADRILFIENGQLELEGSPQELSQSSQRYRQLKA3DDGDL**LIGAINK***KNIP*LLF*HCGMFFYYUIFAF*K 

m nmmm mn m = = = 111 = 1 1 

AGVEAADKIVFLENGKTEMEGTHEELIAANERYRRLYHLDVPVK 



There is also homology to SEQ ID 478. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1616 

A DNA sequence (GBSxl711) was identified in S.agalactiae <SEQ ID 4987> which encodes the amino 
acid sequence <SEQ ID 4988>. This protein is predicted to be spore germination protein C3 (ispB). 
Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.06 Transmembrane 111 - 127 ( 111 - 128) 



Final Results 

bacterial membrane Certainty=0. 1426 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14190 GB:Z99115 heptaprenyl diphosphate synthase component II 
[Bacillus subtilis] 
Identities = 101/318 (31%), Positives = 184/318 (57%), Gaps = 5/318 (1%) 

Query: 8 YPELKKNIDETNQLIQERIQVRWKDIEAALSQLTAAGGKQLRPAFFYLFSQLGNECENQDT 67 

Y L +ID + +++ ++ + A L AGGK++RP F L G+ D 

Sbjct: 35 YSFLNDDIDVIERELEQTVRSDYPLLSEAGLHLLQAGGKRIRPVFVLLSGMFGD---YDI 91 
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Query: 


68 


Sbjct: 


92 


Query: 


128 


Sbjct: 


152 


Query: 


188 


Sbjct: 


211 




248 


Sbjct: 


271 




308 




330 



QQLKKIAASLEILHVATLIHDDVIDDSPLRRGKMTIQSKFGKDIAVYTGDLLFTVFFDLI 127 

++K +A +LE++H+A+L+HDDVIDD+ LRRG TI++K+ IA+YTGD + +++ 
NKIKYVAVTLEMIHMASLVHDDVIDDA3LRRGKPTIKAKWDNRIAMYTGDYMLAGSLEMM 151 



V 4GE++Q+ +YN +Q + 



i+ +NP + + ++ E +E I + ++++++ +KA +N LP+ 

0,-KNPALKlIQIjKLINSETTQEQLEPIIEEIKKTDA-EASMAVSEMYLQKAFQKLNTLPR 329 

iSAKKQLLQLTNYLLKRK 325 

A+ L + Y+ KRK 
iRARSSLAAIAKYIGKRK 347 

There is also homology to SEQ ID 284. An alignment of the GAS and GBS proteins is shown below: 

Identities = 65/227 (28%) , Positives = 98/227 (42%) , Gaps = 9/227 (3%) 

LPAFFYLFSQLGNKENQDTQQLKKIAASLEILHVATLIHDDV- -IDDSPLRRGM 100 
IP + Q+ +AA+LE++H +LIHDD+ +D+ RRG 



Query: 


43 


Sbjct: 


36 


Query: 


101 


Sbjct: 


94 




159 


Sbjct: 


154 




216 


Sbj ct: 


214 



+FG+ A+ GD L 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1617 

A DNA sequence (GBSxl712) was identified in S.agalactiae <SEQ ID 4989> which encodes the amino 
acid sequence <SEQ ID 4990>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal 



Final Results 

bacterial cytoplasm Certainty=0. 3995 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25232 GB:M58315 dipeptidyl peptidase IV [Lactococcus lactis] 
Identities = 385/767 (50%) , Positives = 504/767 (65%) , Gaps = 21/767 (2%) 

Query: 1 MRYNQFSYIPTKPNFAFEELKGLGFPtNKKNSDKANLEAFLRHSFLNQTDTDYALSLLIV 60 

MR+N FS + +E EL LGF + +K L+ FL S + TD L 
Sbjct: 1 MRFNHFSIVDKNFDEQLAELDQLGFRWSVFVIDEKKIIjKDFLIQSPSDMTD LQA 53 
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Query: 61 DAKTDALTFFKSNSDLTLENLQWIYLQLLGFIPFVDFKDPKA? LQDINFPVSY 113 

A+ D + F KS+ +L E I LQLL F+P DF+ KAF L I ++ 

Sbjct: 54 TAELDVIEFLKSSIELDWEIFWNIALQLLDFVPNFDFEIGKAFEYAKNSNLPQIEAEMTT 113 

5 Query: 114 DNIFQSLHBILLACRGKSGNTLIDQLVADGLIJIADNEiyHFFNGKSIATFNTMQLIREVVYV 173 

+NI + ++LL R K+G L++ V++GLL DNHYHFFN KSLATF+++ L REV++V 
Sbjct: 114 ENIISAFYYLLCTRRKNGMILVEHWVSEGLLPLDNHYHFFNDKSLATFDSSLLEREVLWV 173 

Query: 174 ETSLDTMSSGEHDLVKOTIIRPTraHTIPTMMTASPYHQGINDPAADQKTYQMEGAtAVK 233 
10 E4 +D+ GE+DL+K+ IIRP + +P +MTASPYH GIND AD + M L K 

Sbjct: 174 ESPVDSEQRGENDLIKIQIIRPKSTEKLPVVMTASPYHLG1NDKANDLALHDMNVELEEK 233 

Query: 234 QPKHIQVDTKPFKEEVKHPSKLP1-SPAT3SFTHIDSYSLNDYFLSRGFANIYVSGVGTA 292 
I V+ K ++ +LPI A FTH +YSLNDYFL+RGFA+IYV+GVGT 

15 Sbjct: 234 TSHEIHVEQKLPQKLSAKAKELP1VDKAPYRFTHGKTYSLMDYFLTRGFASIYVAGVGTR 293 

Query: 293 GSTGFMTSGDYQQIQSFKAVIDWLNGKVTAFTSHKRDKQVKANWSNGLVATTGKSYLGTM 352 

S GF TSGDYQQI S AVIDWLNG+ A+TS K+ ++KA+W+NG VA TGKSYLGTM 
Sbjct: 294 SSDGFQTSGDYQQIYSMTAVIDWLNGRARAYTSRKKTKEIKASWANGKVAMTGKSYLGTM 353 

20 

Query: 353 STGLATTGVEGLKVIIAEAAISTWYDYYRENGLVCSPGGYPGEDLDVLTELTYSRNLLAG 412 

+ G ATTGVEGL+VI +AEA IS+WY+YYRENGLV SPGG+PGEDLDVL LTYSRNL 
Sbjct: 354 AYGAATTGVEGLEVILAEAGISSWYNYYRENGLVRSPGGFPGEDLDVLAALTYSRNLDGA 413 

25 Query: 413 DYIKl^CYQALrjffiQSKAIDRQSGDYNQYWHDPJ^LTt^mVKSRWYTHGLQDWMVKP 472 

D++K N Y+ L E + A+DR+SGDYNQ+WHDRNYL + + VK+ V+ HGLQDWNV P 
Sbjct: 414 DFLKGNAEYEKRLAEMTAALDRKSGDYNQFVfflDROTLIOTDKYKADVLIVHGLQDWNOTP 473 

Query: 473 RHVYKVFNALPQTIKKHLFLHQGQHVYMHN5«QSIDFRESMNALLSQELLGIDNHFQLEEV 532 
30 Y + ALP+ KH FLH+G H+YM++WQSIDF E++NA +LL D + h V 

Sbjct: 474 EQAYWFWKADPEGHAKHAFLHRGAHlYMNSWQSIDFSETINAYFVAKIjLDRDLNLNLPPV 533 

Query: 533 IWQDMTTEQTWQVLDAFGGNHQEQIGLGD SKKLIDNHYDKEAFDTYCKDENVFKNDL 589 

I Q+N+ +Q W +++ FG N Q ++ LG S DNHYD E F Y KDFNVFK DL 
35 Sbjct: 534 ILQENSKTJQVWTMMNDFGANTQIKLPLGKTAVSFAQFD^^ 593 

Query: 590 FKGMJKTNQITINLPLKKNYLLNGQCKLHLRVKTSDKKAILSAQILDYGPKKRFKDTPTI 549 

F+ NK N+ I+L L +NG 4L LR+K +D K LSAQILD+G KKR +D + 

Sbjct: 594 FE--NKANEAVIDLELPSMLTINGPVELELRLKLNDTKGFLSAQILDFGQKKRLEDKARV 651 

40 

Query: 650 KFLNSLDNGKNFAREALRELPFTKDHYRVISKGVLNLQNRTDLLTIEAIEPEQWFDIEFS 709 

K LD G+NF + L ELP + Y++I+KG NLQN+ +LLT+ ++ ++WF 1+F 
Sbjct: 652 KDFKAnjDRGRNFMLDDLVELPLVESPYQLITKGFTNLQNQ-KLLTVSDLKADEWFTIKFE 710 

45 Query: 710 LQPSIYQLSKGDNLRIILYTTDFEHTIRDHASYSITVDLSQSYLTIP 756 

LQP+IY L K D LR+ILY+TDFEHT+RDN + +DLSQS L IP 
Sbjct: 711 LQPTIYHLEKADKLRVILYSTDFEHTVRDNRKVTYEIDLSQSKLIIP 757 

A related DNA sequence was identified in S.pyogenes <SEQ ID 499 1> which encodes the amino acid 
50 sequence <SEQ ID 4992>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

' Final Results 

55 bacterial cytoplasm --- Certainty=0. 2553 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

60 Identities = 481/758 (63%) , Positives = 587/758 (76%) , Gaps = 4/758 (0%) 

Query: 1 ^YNQFSYIPTKPNEAFEELKGLGFPLNKKNSDKANLEAFLRHSFLNQTDTDYALSLLIV 60 

MRYNQFSYIPT A EELK LGF L+ + + KA+LE+FLR F + D +DY LS LI 
Sbjct: 1 ^YNQFSYIPTSLERAAEELKELGFDLDLQKTAKASLESFLRKLFFHYPDSDYPLSHLIA 60 
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Query: 61 DAKTDALTFFKSNSDLTLENLQW:YLQLLGFIPFVDFKDPKAFLQDINFPVSYDN--IFQ 118 

DAL+FF+S +L+ E + LQ+LGFIP VDF + AFL + FP+ +D I + 
Sbjct: 61 KNDMDALSFFQSEQELSKEVFDLIALQVLGFIPGVDFTEADAFLDKLAFPIHFDETEIIK 120 

Query: 119 SLHHLIACRGKSGNTLIDQLVADGLLHADNHYHFFNGKSLATFNTNQL1REWYVETSLD 178 

+HHLLA R KSG TLID LV+ G+L DN YHFFNGKSLATF+T+QLIREWYVE LD 
Sbjct: 121 HIHHLIATRCKSGMTLIDDLVSQGMLTMDNDYHFFNGKSIATFDTSQLIREWYVEAPLD 180 

Query: 179 TMSSGEHDLVKVNIIRPTTEHTIPTMMTASPYHQGINDPAADQKTYQMEGAIAVKQPKHI 238 

T G+ DL+KVNIIRP ++ +PT+MT SPYHQGIN+ A D+K Y+ME L VK+ + I 
Sbjct: 181 TDQDGQLDLIKYtJI IRPQSQKPLPTLMTPSPYHQ3IMEVANDKKLYRMEKELWKKRRQI 240 

Query: 239 QVDTKPFKEEVKHPSKLPISPATESFTHIDSYSLNDYFLSRGFANIYVSGVGTAGSTGFM 298 

V+ + F P KLPI ESF++I+SYSLNDYFL+RGFANIYVSGVGTAGSTGFM 

Sbjct: 241 TVEDRDFIPLETQPCKLPIGQNtiESFSYINSYSLNDYFLARGFANIYVSGVGTAGSTGFM 3 00 

Query: 299 TSGDYQQIQSFKAVIDWLTJGKVTAFTSHKRDKQVKAM'JSNGLVATTGKSYLGTMSTGLAT 358 

TSG+Y QI+SFKAVIDWLNG+ TA+TSH + QV+A+W+NGLV TTGKSYLGTMSTGIAT 
Sbjct: 301 TSGNYAQIESFKAVIDWLNGRATAYTSHSKTHQVRADWANGLVCTTGKSYLGTMSTGIAT 360 

Query: 359 TGVEGLKVI IAEAAI STWYDYYRENGLVCS PGGYPGEDLDVLTELTYSRNLIAGDYI KNKT 418 

TGV+GL + 1 IAE+AI S+WY+YYRENGLVCSPGGYPGEDLDVLTELTYSRNLLAGDY+ ++N 
Sbjct: 361 TGVDGLAMI IAESAI SSWYNYYRENGLVCSPGGTYPGEDLDVLTELTYSRNLIAGDYLRHN 420 

Query: 419 DCYQAL1^QSKAIDRQSGDYNQYWHDRI^LTH\^1^SRWYTHGLQD™VKPRHVYKV 478 

D YQ LLN+QS+A+DRQSGDYNQ+WHDRNYL + + +K WYTHGLQDWNVKPR VY++ 
Sbjct: 421 DRYQELLNQQSQALDRQSGDYNQFWHDFJSTYLKNAHQIKCDVWTHGLQDWNVKPRQVYEI 480 

Query: 479 FNAI J PQTIKKHLFLHQGQHVYMHNWQSIDFRES^5NALLSQELLGIDNHFQLEEVIVIQDNT 538 

FNALP TI KHuFLHQG+HVYMHNWQSIDFRESMsIALL Q+LLG+ NFL E+IWQDNT 
Sbjct: 481 FNALPSTINKHLFLHQGEHVYMHNMQSIDFRESMNALLCQKLLGLANDFSLPEMIWQDNT 540 

Query: 539 TEQTWQVLDAFGGIfflQEQIGLGDSKKLIDNHYDKEAFDTYCKDFlWFKiroLFKGNNKTNQ 598 

Q WQ FG + +++ LG LIDNHY ++ F Y KDF FK LFKG K NQ 
Sbjct: 541 CPQNWQERKVFGTSTIKELDLGQELLLIDNHYGEDEFKAYGKDFRAFKAALFKG--KANQ 598 

Query: 599 ITINLPLKKNYLLNGQCKLHLRVKTSDKKAILSAQILDYGPKKRFKDTPTIKFLNSLDNG 658 

I++ L++4 +NG+ L L+VK+S+ K +LSAQILDYG KKR D P +S+DNG 
Sbjct: 599 ALIDILLEEDLPINGEIVLQLKVKSSENKGLLSAQILDYGKKKRLGDLPIALTQSSIDNG 658 

Query: 659 lOTFAREALRELPFTKDHYRVISKGVLNLQNRTDIiLTIEAIEPEQWFDIEFSLQPSIYQLS 718 

+NF+RE L+ELPF +D YRVISKG +NLQNR +L +IE I +W + LQP+IY L 
Sbjct: 659 QNFSREPLKELPFREDSYRVISKGFMNLQNRMttSSIETIPNNKWMTVRLPLQPTIYHLE 718 

Query: 719 KGDNLRIILYTTDFEHTIRDNASYSITVDLSQSYLTIP 756 

KGD LR+ILYTTDFEHT+RDN++Y++T+DLSQS L +P 
Sbjct: 719 KGDTLRVILYTTDFEHTVRDNSNYALTIDLSQSQLIVP 756 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1618 

A DNA sequence (GBSxl713) was identified in S.agalactiae <SEQ ID 4993> which encodes the amino 
acid sequence <SEQ ID 4994>. This protein is predicted to be PrfA. Analysis of this protein sequence 
reveals the following: 

55 Possible site: 54 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3976 (Affirmative) < suco 

60 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10125> which encodes amino acid sequence <SEQ ID 
10126> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA65740 6B:X97014 PrfA [Listeria seeligeri] 
Identities = 54/181 (29%) , Positives = 95/181 (51%) , Gaps = 1/181 (0%) 



Query 
Sbj ct 

Sbjct 

Sbj. 



3 8 DYTYILKDGIVKQSVLSKYGTEFNLRYVTGLEITSILNTDYSQHMGEPYMVRIESETAHF 97 

+Y L +G+ K + +S+ G NL+Y G I D + +G YN+ + SE A 

3 6 EYCIFLHEGVAKLTS I SESGDI LNLQYYKGAFI IMTGFI DTEKSLGY - YNLE WSEQAAA 94 

98 YKOTRSTFLKDINMDIELQGYVTODFYHNI^EKSMKKMQCMLTNGRIGAISTQIjYDLSKMF 157 

Y ++ S + ++ D++ Y+ D ++ S+ K +NG++G+I Q L+ ++ 

95 YIIKISDLKELVSKDLKQLFYIIDTLQKQVSYSLAKFMDFSSNGKVGSICGQFLILAYVY 154 

158 GEERDNGDIYINFVITOEELGKFCGISTGSSVSRILKQLKDDHIIRIEKQHIIimVEKLK 218 

GEE NG +T +ELG GI+ S+VSRI+ +LK +++I + + I N+ LK 

155 GEETPNGIKITLEKLTMQELGCSSGIAHSSAVSRIISKLKQENVIEYKDSYFYIKNIAYLK 215 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4995> which encodes the amino acid 
sequence <SEQ ID 4996>. Analysis of this protein sequence reveals the following: 

d W- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4088 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 186/223 (83%) , Positives = 203/223 (90%) 

Query: 1 MEEV^IIraQILQNYINSHNLPIIEKDYHKYLTFESLEEDYTYILKDGIVKQSVLSI<YGTEF 60 

+E+ +NH ILQ YI++HN PIIEK YHKYLTFESLEED+TYILKDGIVKQSVLSKYG EF 
Sbjct: 17 LEKSVNHHILQRYIDNHNFPIIEKSYHKYLTFESLEEDFTYILKDGIVKQSVLSKYGMEF 76 

Query: 61 NIjRYVTGLEITSIIjNTDYSQHMGEPYNTOIESETAHFYI<WRSTFLIuOINNDIELQGYVK 120 

NLRYVTGLEITS+LNT YS4 MGEPYNVRIESE A FYKVRRS FLKDIW DIELQGYVK 
Sbjct: 77 I^YWGLEITSVLNTGYSKDMGEPYI^IESEKASFYKVRRSAFLKIIINEDIELQGYVK 136 

Query: 121 DFYHHRLEKSMKKMQCMLTNGRIGAISTQLYDLSKIIFGEERDNGDIYINFVITNEELGKF 180 

DFYHNRL+KSMKKMQCMLTWGRIGAISTQ+YDL +FGEE NG I INFVITNEELGKF 
Sbjct: 137 DFYHNRLQKSMKKMQCMLTNGRIGAISTQIYDLMTLFGEELPNGQILINFVITNEELGKF 196 

Query: 181 CGI STGSSVSRILKQLKDDHI IRIEKQHII ITNVEKLKDHIVF 223 

CGIST SSVSRILKQLK+ +IIRI+KQHIIITN++KLKD+IVF 
Sbjct: 197 CGISTASSVSRILKQLKEKNI IRIDKQHI I ITNLDKliKDNIVF 239 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1619 

A DNA sequence (GBSxl714) was identified in S.agalactiae <SEQ ID 4997> which encodes the amino 
acid sequence <SEQ ID 4998>. Analysis of this protein sequence reveals the following: 
Possible site: 46 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.33 Transmembrane 167 - 183 ( 159 - 193) 
INTEGRAL Likelihood = -7.96 Transmembrane 18 - 34 ( 10 - 37) 
INTEGRAL Likelihood = -7.75 Transmembrane 373 - 389 ( 369 - 392) 



WO 02/34771 



PCT/GB01/04789 



-1805- 



INTEGRAL Likelihood = -5.68 Transmembrane 214 - 230 ( 212 - 234) 

INTEGRAL Likelihood = -4.78 Transmembrane 243 - 2S9 ( 241 - 262) 

INTEGRAL Likelihood = -2.71 Transmembrane 48 - 64 ( 47 - 65) 

INTEGRAL Likelihood = -2.60 Transmembrane 283 - 299 ( 283 - 300) 

Final Results 

bacterial membrane Certainty=0. 6731 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15662 GB:Z99122 similar to antibiotic resistance protein 
[Bacillus subtilis] 
Identities = 106/401 (26%) , Positives = 199/401 (49%) , Gaps = 21/401 (5%) 

Query: 3 DKLFNKHFIGITILNFIVYMVYYLFTVIIAFIATKELGVSTSQAGLATGIYIVGTLIARL 62 

D ++ K FI + ++N V++ +Y F ++ +ELG + SQ GL ++++ +1 R 

Sbjct: 5 DAIWTKDFIMVLLVNLFVFVFFYTFLTVLPIYTLQELGGTESQGGLLISLFLLSAIITRP 64 

Query: 63 IFGKQLEVLGRKLVLRGGaiFYLLTTIAYFYMPSIGVMYLTOFLNGFGYGWSTATNTIV 122 

G +E G+K + + L++ Y + + ++ +RF G + +++T T I 

Sbjct: 65 FSGAIWRFGKKRmiVSM&LFALSSFLYMPIHNFSLLLGLRFFQGIWFSILTTVTGAIA 124 

Query: 123 TAYIPADKRGEGINFYGLSTSLAAAIGPFVGTFMLDNLHINFKMVIVLCSILIAIWLGA 182 

IPA +RGEG+ ++ +S +LA AIGPF+G ++ ++F + ++ + +L + 

Sbjct: 125 ADIIPAKRRGEGLGYFAMSMNLAMAIGPFLGLNLMRV--VSFPVFFTAFALFMVAGLLVS 182 

Query: 183 FVFPVKNITLNPEQLAKSKSWTIDSF IEKKAIFITIIAFLMGISYASVLGFQKLY 237 

F+ V +K T+ F EK A+ I + + Y++V + ++ 

Sbjct: 183 FLIKVPQ SKDSGTTVFRFAFSDMFEKGALKIATVGLFISFCYSTVTSYLSVF 234 

Query: 238 TTEINLMWGAYFFIWALVITLTPJSMGRLMDAKGDKWVIiYPSYLFLTLGLALLGSAMG 297 

++L + YFF+ +A+ + + RP G+L D G V+YPS L ++GL +L 
Sbjct:' 235 AKSVDLSDISGYFFVCFAVTMMIARPFTGKLFDKVGPGIVIYPSILIFSVGLCMLSFTHS 294 

Query: 298 SVTYLLSGALIGFGYGTFMSCGQAASIKGVEEHRFNTAMSTYMIGLDLGLGAGPYILGLV 357 

+ LLSGA+IG GYG+ + C Q +1+ HR A +T+ D G+ G Y+ GL 
Sbjct: 295 GLMLLLSGAVIGLGYGSIVPCMQTLAIQKSPAHRSGFATATFFTFFDSGIAVGSYVFGL- 353 

Query: 358 KDGFLGAGVQSFRELFWIAAI I PWCGILYFLKSSRQVETK 398 

F+ + F ++ A + ++ +LY + E + 

Sbjct: 354 FVASA- -GFSAIYLTAGLFVLIALLLYTWSQKKPAEAE 389 



A related DNA sequence was identified in S.pyogenes <SEQ ID 4999> which encodes the amino acid 
45 sequence <SEQ ID 5000>. Analysis of this protein sequence reveals the following: 



Possible site: 35 



50 



55 





have an uncleavable N-term signal seg 










INTEGRAL 


Likelihood =-12.31 


Transmembrane 


202 


218 




225 


INTEGRAL 


Likelihood = -7.80 


Transmembrane 


53 • 


69 




71 


INTEGRAL 


Likelihood = -7.17 


Transmembrane 


407 


423 




425 


INTEGRAL 


Likelihood = -5.26 


Transmembrane 


249 


265 


247 


259 


INTEGRAL 


Likelihood = -3.77 


Transmembrane 


279 


295 




297 


INTEGRAL 


Likelihood = -2.23 


Transmembrane 


11 


27 


10 


27 


INTEGRAL 


Likelihood = -2.13 


Transmembrane 


83 


99 


82 


99 


INTEGRAL 


Likelihood = -1.91 


Transmembrane 


312 


328 


311 


328 



Final Results 

bacterial membrane Certainty=0. 5925 (Affirmative) • 

bacterial outside --- Certainty=0. 0000 (Not Clear) < i 
bacterial cytoplasm certainty=0 . 0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 

d antibiotic resistance protein 
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Identities = 110/390 (28%) , Positives = 194/390 (49%) , Gaps = 11/390 (2%) 

Query: 38 EKLFNKHWAITVINFIVYMVYyLFTVlIAFVATRELG&QTSQAGLATGIYILGTLLARL 97 

+ ++ K F+ + ++W V++ +Y F +ELG SQ GL +++L ++ R 

Sbjct: 5 DAIWTKDFIMVlLvMLFVFVFFYrFLTVLPIYTLQELGGTESQCSGLLISLFIiLSAIITRP 64 

Query: 98 I FGKQLEVFGRRLVLRGGAI FYLLTTLAYFYMPT I SMMYLVRFLNGFGYG WSTATNT IV 157 

G +E FG++ + + L++ Y + S++ +RF G + +++T T I 

Sbjct: S5 FSGAIVERFGKKRMAIVSMALFALSSFLYMPIHLTFSLLLGLRFFQGIWFSILTTVTGAIA 124 

Query: 158 TAYIPARKRGEGINFYGLSTSIAAAIGPFVGTFMLDNTLHIDFRMIIVLCSVLIGCVWGA 217 

IPA++RGEG+ ++ +S +LA AIGPF+G ++ + F + ++ + ++ + 

Sbjct: 125 ADIIPAKRRGEGLGYFAMSMNLAMAIGPFLGLNLMRV--VSFPVFFTAFALFrWAGLLVS 182 

Query: 218 FAFPWNMSLNAEQIAKTKSWTVDSFIEKKALFITAIAFIjMGIAYASvLGFQKIjYTSEIH 277 

FV + ++ + EKALI+ + Y++V + ++ + 

Sbjct: 183 FLIKVPQSKDSGTTVFR---FAFSDMFEKGALKIA , TVGLFISFCYSTVTSYLSVFAKSVD 239 

Query: 278 LTWGAYFFvVYALIITITRPAMGRLMDAKGDKl-rVLYPSYLFLAMGLFLLGSVSSGGSYL 337 

L+ + YFFV +A+ + I RP G+L D G V+YPS L ++GL +L SG L 
Sbjct: 240 LSDISGYFFVCFAVTMMIARPFTGKLFDKVGPGIVIYPSILIFSVGLCMLSFTHSGLMLL 299 

Query: 338 LSGALIGFGYGTFMSCGQAASIQGVDEHRFNTAMSTYMIGLDLGLGAGPYLLGLIKDLAL 397 

LSGA+IG GYG+ + C Q +IQ HR A +T+ D G+ G Y+ GL 
Sbjct: 300 LSGAVIGLGYGSIVPCMQTIAIQKSPAHRSGFATATFFTFFDSGIAVGSYVFGLF 354 

Query: 398 GSGVASFRHLFWLAAVIPLICTLLYLLKTK 427 

A F ++ A 4- LI LLY K 
Sbjct: 355 -VASAGFSAIYLTAGLFVLIALLLYTWSQK 3 83 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 328/396 (82%), Positives = 370/396 (92%), Gaps = 1/396 (0%) 

Query: 1 MEDKLFNKHFIGITILNFIVYMVYYLFTVIIAFIATKELGVSTSQAGLATGIYIVGTLIA 60 

ME+KLFNKHF+ IT++NFIVYMVYYLFJVIIAF+AT+ELG TSQAGLATGI YI +GTL+A 
Sbjct: 36 MEEKIiFNKHFVAIWINFIVYMVYYLFTVIIAFVATRELGAQTSQAGLATGIYILGTLLA 95 

Query: 61 RLIFGKQLEVLGRKLVLRGGAIFYLLTTIAYFYMPSIGVMYLVRFLNGFGYGWSTATNT 120 

RLIFGKQLEV GR+LVLRGGAI FYLLTTLAYFYMP+ 1 +MYLVRFLNGFGYGWSTATNT 
Sbjct: 96 RLIFGKQLEVFGRRLVLRGGAIFYLLTTLAYFYMPTISMMYLVRFLNGFGYGWSTATNT 155 

Query: 121 IVTAYIPADKRGEGINFYGLSTSLAAAIGPFVGTFMLDNLHINFKMVIVLCSILIAIWL 180 

IVTAYIPA KRGEGINFYGLSTSLAAAIGPFVGTFMLDNLHI+F+M+IVLCS+LI W+ 
Sbjct: 156 IVTAYIPARKRGEGINFYGLSTSI^AIGPFVGTFMLDj^HIDFRMIIVLCSVLIGCVVV 215 

Query: 181 GAFVFPVKNITLNPEQLAKSKSWTIDSFIEKKAIFITIIAFLMGISYASVLGFQKLYTTE 240 

GAF FPVKN++LN EQLAK+KSWT+DSFIEKKA+FIT IAFLMGI +YAS VLGFQKLYT+E 
Sbjct: 216 GAFAFPVKM^SIjNAEQILAKTKSWTVDSFIEKI<ALFITAIAFLMGIAYASVLGFQKLYTSE 275 

Query: 241 INIMTVGAYFFIWALVITLTRPSMGRLKDAKGDKI^YPSYLFLTLGIALLGSAMGSVT 300 

I+L TVGAYFF+VYAL+ IT+TRP+MGRLMDAKGDKWVLYPSYLFL +GL LLGS + 
Sbjct: 276 IHLTTVGAYFFVVYALIITITRPAMGRLMDAKGDKWVLYPSYLFLAMGLFLLGSVSSGGS 335 

Query: 301 YLLSGALIGFGYGTFMSCGQAASIKGVEEHRFNTAMSTYMIGLDLGLGAGPYILGLVKDG 360 

YLLSGALIGFGYGTFMSCGQAAS I +GV+EHRFNTAMSTYMIGLDLGLGAGPY+LGL+KD 
Sbjct: 33 6 YLLSGALIGFGYGTFMSCGQAASIQGVDEHRFNTAMSTYMIGLDLGLGAGPYLLGLIKDL 395 

Query: 361 FLGAGVQSFRELFWIAAI I PWCGI LYFLKS - SRQV 395 

LG+GV SFR LFW+AA+IP++C +LY LK+ +RQV 
Sbjct: 396 ALGSGVASFRHLFWLAAVIPLICrLLYLLKrKTRQV 431 

A related GBS gene <SEQ ID 8863> and protein <SEQ ID 8864> were also identified. Analysis of this 
protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-1807- 



GvH: Signal Score ( 
Possible site: 
»> Seems to have ai 
ALOM program 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 



•7.5) : 
46 



-5.21 



INTEGRAL 



modified ALOM 



uncleavable N-term signal se 
count: 7 value: -14.33 threshold: 
Likelihood =-14.33 
Likelihood = -7.96 
Likelihood = -7.75 
Likelihood = -5.68 
Likelihood = -4.78 
Likelihood = -2.71 
Likelihood = -2.60 
Likelihood = 0.69 
score: 3.37 



• 183 ( 159 - 193) 

- 34 ( 10 - 37) 

■ 389 ( 369 - 392) 

• 230 ( 212 - 234) 

- 259 ( 241 - 262) 

■ 64 ( 47 - 65) 

- 299 ( 283 - 300) 



* Reasoning Step: 



■ Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) • 



The protein has homology with the following sequences in the databases: 

ORF01003<307 - 1494 of 1800) 

EGAD|108032|BS3640(5 - 389 of 396) hypothetical protein {Bacillus subtilis} 
25 GP|l68465l|emb|CAB05383 ,l| |Z82987 unknown similar to quinolon resistance protein NorA 

{Bacillus subtilis} GP 1 2636170 1 emb | CAB15662 .1 1 | Z99122 similar to antibiotic resistance 

protein {Bacillus subtilis} PIR|B70065 |B70065 antibiotic resistance protein homolog ywoG - 

Bacillus subtilis 

%Match =14.9 
30 %Identity =26.3 %Similarity =53.4 

Matches = 102 Mismatches = 178 Conservative Sub.s = 105 



TTLTFWAV*Y*HLYYTIEISYLLIFL*NWENEIEKKEFFALEDKLFNKHFIGITII^FIVYMVYYLFTVIIAFIATKE 
| :: | || : ::|: |:: :| | :: :| 
MKKADAIWTKDFIbWLLVNLFVFVFFYTFLTVLPIYTLQE 



444 474 504 534 564 594 624 654 

40 LGVSTSQAGLATGIYIVGTLIARLIFGKQLEVLGRKLVLRGGAIFYLLTTLAYFYMPSIGVMYLVRFLNGFGYGWSTAT 
II : II II = = :: HI I =1 =hl = " h = : | s : :: :|h I : s::| | 

LGGTESQGGLLISLFLLSAIITRPFSGAIVERFGKKRMAIVSMALFALSSFLYMPIHNFSLLLGLRFFQGIWFSIIjTTVT 
50 60 70 80 90 100 110 120 

45 684 714 744 774 804 834 854 894 

OTIVTAYIPADKRGEGINFYGLSTSLAAAIGPFVGTFMLDKLHINFKIWIvLCSILIAIVVLGAFVFPVKNITmPEQLA 
I III :: :| =11 llllhl == ==l = ==== =1 =1= I 

(3AIAADIIPAKRRGEGLGYFAMSMNLA^IGPFLGI^LM--RWSFPVFFTAFALFWAGLLVSFLIK^PQSKDSGTTVF 
130 140 150 160 170 180 190 

50 

924 954 984 1014 1044 1074 1104 1134 

KSKSWTIDSFIEKKAIFITIIAFLMGISYASVLGFQKLYTTEINLMTVGAYFFIVYALVITLTRPSMGRLMDAKGDKWVL 

lib I ! l»l = - -«l •■ III' =1= ■■ •■ II 1 = 1 I I 1 = 

R FAFSDMFEKGALKIATVGLFISFCYSTVTSYLSVFAKSTOLSDISGYFFVCFAVTMMIARPFTGKLFDKVGPGIVI 

55 210 220 230 240 250 260 270 



1164 1194 1224 1254 1284 1314 1344 1374 

YPSYLFLTLGIjALLGSAMGSVTYLLSGALIGFGYGTFMSCGQAASIKGVEEHRFNTAMSTYMIGLDLGLGAGPYILGLVK 
III I :::|| «| = I I I I I = I I = I I I = = I I =1= III =h =1 h I l"ll 

60 YPSILIFSVGLCMLSFTHSGLMLLLSGAVIGLGYGSIVPCMQTLAIQKSPAHRSGFATATFFTFFDSGIAVGSYVFGL-- 
290 300 310 320 330 340 350 



1404 1434 1464 1494 1524 1554 1584 1614 

DGFLGAGVQSFRELFWIAAIIPWCGILYFLKSSRQVETKTI*KGGIKL*HKNMSVFLLLLMGLTSQNWR*KKG*MLLFV 
65 | :: | : :: :|| : | : 

FVASAGFSAIYLTAGLFVLIALLLYTWSQKKPAEAEGKVS IAS 

360 370 380 390 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1620 

A DNA sequence (GBSxl715) was identified in S.agalactiae <SEQ ID 5001> which encodes the amino 
acid sequence <SEQ ID 5002>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



10 Final ResultG 

bacterial cytoplasm Certainty=0. 0151 (Affirmative) < succ 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 yERILIAIDGSYESELAVEKGINVALRNDAELLLTHVIDRHAYQSEGVFSDYVFDRQEQE 64 

Y IL+A+DGS +++ A+ K N A A+L + HVID+ ++ + + V E + 
Sbjct: 2 YNHILVAVDGSTQAKRALYKAFNYAKEFKADLFICHVIDSRSFATVEQYDRTWGAAELD 61 

Query: 65 SADV^YFEKLAHSKGLTKIKKITEIGNPKTLLAKDIPIREKADLIMVGATGLNTFERLL 124 

+L + + A G+ K+ I + G+PK +4K I + DLI+ GATGLN ER L 
Sbjct: 62 GKI<LLQRYSEEAEI\AGVDIWIiTILDFGSPKANISKTIAQKYDIDIiIITGATGIiNAVERFL 121 

Query: 125 IGSTSEYILRHSKVDMLWRDSK 147 

+GS SE + RH+K D+L+VR+ + 
Sbjct: 122 MGSVSESVARHAKCDVLIVRNDQ 144 

There is also homology to SEQ ID 3658: 

Identities = 105/150 (70%), Positives = 121/150 (80%) 

Query: 1 MTQKYERILIAIDGSYESEIAVEKGINVABRNDAELLLTHVIDAHAYQSEGVFSDYVFDR 60 

M+ KY+RIL+AIDGSYESEIA KG+NVALRNDA LLL HVID A QS F Y++++ 
Sbjct: 31 MSLKYKRILVAIDGSYESEIAFNKGYNVALRNDATJjLLVHVIDTRALQSVATFDTYIYEK 90 

Query: 61 QEQESADVIAYFEKIAHSKGLTKIKlCITEIGNPKTLIAiaJIPIREKADLrtW'GATGLNTF 120 

EQE+ DVL FEK A G+T IK+I E GNPK LLA DIP RE ADLIMVGATGLNTF 
Sbjct: 91 LEQEAKDVLDDFEKQAQIAGITNIKQIIEFGNPKNLLAHDIPDRENADLIMVGATGLNTF 150 

Query: 121 ERLLIGSTSEYILRHSKVDMLWRDSKKTL 150 

ERLLIGS+SEYI+RH+K+D+LWRDS KTL 
Sbjct: 151 ERLLIGSSSEYIMRHAKIDLLWRDSTKTL 180 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1621 

A DNA sequence (GBSxl716) was identified in S.agalactiae <SEQ ID 5003> which encodes the amino 
acid sequence <SEQ ID 5004>. This protein is predicted to be glycerol uptake facilitator protein (glpF). 
Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -8.65 Transmembrane 261 - 277 ( 257 - 281) 
Likelihood = -5.73 Transmembrane 201 - 217 ( 199 - 222) 
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INTEGRAL Likelihood = -4.51 Transmembrane 92 - 108 ( 91 - 

INTEGRAL Likelihood = -4.30 Transmembrane 44 - 60 ( 42 - 

INTEGRAL Likelihood = -2.18 Transmembrane 15 - 31 ( 11 - 

INTEGRAL Likelihood = -1.54 Transmembrane 150 - 166 ( 149 - 



- Certainty=0. 4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25231 GB:M58315 putative [Lactococcus lactis] 
Identities = 183/290 (63%) , Positives = 228/290 (78%) , Gaps = 10/290 (3%) 

Query: 2 IEITWTVKYITEFIATAFLI ILGNGAVANVDLKGTKGNNSGWI I IAIGYGLGVMMPALMF 61 

+++TWTVKYITEF+ TA LII+GNGAVANV+LKGTK + W+II GYGLGVM+PA+ F 
Sbjct: 1 MDVTWTVKYITEFVGTALLI IMGNGAVANVELKGTKAHAQSWMI IGWGYGLGVMLPAVAF 60 

Query: 62 GOTSGNHINPAFTLGLAFSGLFPWAHVGQYILAQILGAMFGQLVVvMVYQPYFVKTENPN 121 

GN++ + INPAFTLGLA SGLFPWAHV QYI+AQ+LGAMFGQL++VMVY+PY++KT+NPN 
Sbjct: 61 GNIT- SQINPAFTLGLAASGLFPWAHVAQYI IAQVLGAMFGQLLI VMVYRPYYLKTQNPN 119 

Query: 122 HVLGSFSTISALDDGQKSSRKAAYINGFLNEFVGSFVLFFGALALTKNYFGVE LVG 177 

+LG+FSTI +DD + +R A INGFLNEF+GSFVLFFGA+A T +FG + + 
Sbjct: 120 AILGTFSTIDNVDDNSEKTRLGATINGFLNEFLGSFVLFFGAVAATNIFFGSQSITWMTN 179 

Query: 178 KLVQAGYDQTTAATRISPYVTGSLA VAHLGIGFLVMTLVASLGGPTGPALNPARD 232 

L G D +++ +V S A +AHL +GFLVM LV +LGGPTGP LNPARD 

Sbjct: 180 YLKGQGADVSSSDVMNQIWVQASGASASKMIAHLFLGFLVMGLWALGGPTGPGLNPARD 239 

Query: 233 LGPRIvHRLLPKQILGQAKEDSKWWYAWVPVLAPIVASILAVALFKLLYL 282 

GPR+VH LLPK +LG+AK SKWWYAWVPVIAPI+AS+ AVALFK++YL 
Sbjct: 240 FGPRLVHSLLPKSVLGEAKGSSKWWYAWVPVLAPILASLAAVALFKMIYL 289 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5005> which encodes the amino acid 
sequence <SEQ ID 5006>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

i uncleavable N-term signal seq 



Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



Transmembrane 293 - 309 ( 288 - 314) 

43 Transmembrane 2 - 18 ( 1 - 20) 

38 Transmembrane 233 - 249 ( 228 - 256) 

57 Transmembrane 124 - 140 ( 123 - 142) 

87 Transmembrane 76 - 92 ( 75 - 93) 

18 Transmembrane 47 - 53 ( 43 - 63) 

54 Transmembrane 182 - 198 ( 181 - 198) 



Final Results 

bacterial membrane Certainty=0. 4673 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < ; 

The protein has homology with the following sequences in the databases: 



Query: 34 MEMTWTVTCYITEFIATAFLIILG^GAVAmTJLKGTKGHNSGWLVIAFGYGLGVMMPALMF 93 

M++TWWKYITEF+ TA LII+GNGAVANV+LKGTK H W++I +GYGLGVM+PA+ F 
Sbjct: 1 MDVTWTVKYITEFVGTALLI IMGNGAVANVELKGTKAHAQSWMI IGWGYGLGVMLPAVAF 60 

Query: 94 GNVSGNHINPAFTVGLAVSGLFPWAHvIiQYVVAQLLGMBGQLVVVMVYKPYFMKTENPN 153 

GN++ + INPAFT+GLA SGLFPWAHV QY++AQ+LGA+FGQL++VMVY+PY++KT+NPN 
Sbjct: 61 GNIT-SQINPAFTLGLAASGLFPWAHVAQYIIAQVLGAMFGQLLIVMVYRPYYLKTQNPN 119 

Query: 154 HVLGSFSTISSLDNGQKDSHKASYINGFIl^BVGSFvLFFGALALTKNYFGVELVGKLIE 213 
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Query: 214 AGYDQTTAATQISPYVTGSLA VAHIGIGFLVMVLVTSLGGPTGPMiNPARD 264 

A + QI +G+ A 4AH+ +GFLVM LV +LGGPTGP LNPARD 
Sbjct: 180 YLKGQGADVSSSDVIWQIWQA3ffi^ASKIv:IAHLFLGFLVMGLWALGGPTGPGLNPARD 239 

Query: 265 FGPRLLHHFLPKSVLGQAKGDSKWWYAOTPWAPIIiAAIVAVAAFKYLYI 314 

FGPRL4H LPKSVLG+AKG SKWWYAWVPV+APILA++ AVA FK +Y+ 
Sbjct: 240 FGPRIjVHSLLPKSVLGEAKGSSraWYAVWPVLAPILASIAAVALFKMIYL 289 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 240/281 (85%) , Positives = 267/281 (94%) 

15 Query: 2 IEITVFTVKYITEFIATAFLI ILGNGAVAOTDLKGTKGNNSGWI I IAIGYGLGVMMPALMF 61 

+E+TWTVKYITEFIATAFLIILGNGAVANVDLKGTKG+NSGW++IA GYGLGVMMPALMF 
Sbjct: 34 ^MTWTVKYITEFIATAFLIILGNGAVANVDLKGTKGHNSGWLVIAFGYGLGVMMPALMF 93 



Query: 62 GNVSGNHINPAFTLGIAFSGLFPWAHVGQYILAQILGAMFGQLVWMVYQPYFVKTENPN 121 

GNVSGNHINPAFT+GLA SGLFPWAHV QY++AQ+LGA+FGQLVWMVY+PYF+KTENPN 
Sbjct: 94 GlWSGlSffllNPAFTVGLAVSGLFPWAHVLQYWAQLLGAIFGQLWVMVYKPYFMICrENPN 153 

Query: 122 HVLGSFSTISALDDGQKSSRKARYINGFLNEFVGSFVLFFGAIALTKNYFGVELVGKLVQ 181 

HVLGSFST1S+LD+GQK S KA+YINGFLNEFVGSFVLFFGAIALTKNYFGVELVGKL++ 
Sbjct: 154 HVLGSFSTISSLDNGQKDSHKASY1NGFIJSIEFVGSFVLFFGAIMTK]NIYFG\7ELVGKLIE 213 

Query: 182 AGYDQTTAATRISPYVTGSLAVAHLGIGFLVMTLVASLGGPTGPALNPARDLGPRIVHRL 241 

AGYDQTTAAT+ISPYVTGSLAVAH+GIGFLVM LV SLGGPTGPALNPARD GPR++H 
Sbjct: 214 AGYDQTTAATQISPYVTGSIAVAHIGIGFLVMVLVTSLGGPTGPALNPARDFGPRLLHHF 273 

Query: 242 LPKQILGQAKEDSKOTreAWPVlAPIVASILAVALFKLLYL 282 

LPK +LGQAK DSKWWYAWPV+API+A+I+AVA FK LY+ 
Sbjct: 274 LPKSVLGQAKGDSKWWYAWVPWAPILAAIVAVAAFKYLYI 314 



35 A related GBS gene <SEQ ID 8865> and protein <SEQ ID 8866> were also identified. Analysis of this 
protein sequence reveals the following: 



Idpop: Possible site: -1 Crend: 8 
McG: Discrim Score: 2.81 
GvH: Signal Score (-7.5): -3.6 

Possible site: 23 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 6 value: -8.65 threshold: 0. 

Likelihood = -8.65 Transmembrane 2 
Likelihood = 
Likelihood = -4.51 
Likelihood = - 
Likelihood = ■ 
Likelihood = • 
PERIPHERAL Likelihood = 
modified ALOM score: 2.23 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 



150 - 166 ( 149 - 



*** Reasoning Step: 3 

Final Results 

55 bacterial membrane Certaxnty=0 .4461 (Affirmative) < succ: 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

60 ORF01006(304 - 1146 of 1446) 

EGAD| 14239 1 14211(1 - 289 of 289) hypothetical 30.9 kd protein in pepx 5' region {Lactococcus 
lactis} SP|P22094|YDP1_LACLC HYPOTHETICAL 30.9 KDA PROTEIN IN PEPX 5'REGION (ORF1) . 
GP|455286|gb|AAA25206.l| |M35865 ORF1 (put.); putative {Lactococcus lactis} 
GP|l49527|gb|AAA25231.l| |M58315 putative {Lactococcus lactis} PIR|B43747 | B43747 
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hypothetical protein (pepXP 5' region) - Lactococcus lactis subsp. cremoris 
PIR]B43748]B43748 hypothetical protein (pepX 5' region) - Lactococcus lactis subsp. lactis 
%Match =37.5 

%Identity =64.4 %Similarity =81.3 
5 Matches = 183 Mismatches = 49 Conservative Sub.s = 48 

123 153 183 213 243 273 303 333 

*YASRS***EI^IN*IK*STR*SEPSTLFFIK^IWLKILLILFC^ 

===1111111 

1 0 MDVTWTVKYI 

10 



363 393 423 453 483 513 543 573 

TEFIATAFLIILGNGAVANVDLKGTKGMtfSGWIIIAIGYGLGVMMPALMFGNVSG 
15 |||: ||:|||:|||lll!|:||||l = 1=11 1111111=11= 111=: lllllllll II 

TEFVGTALLIIMGNGAVAIWELKGTKAHAQSWMIIGK^^ 

20 30 40 50 60 70 80 



603 633 663 693 723 753 783 813 

20 IIAQILGAMFGQLVVVMVYQPYFVKTENPNHVLGSFSTISALDDGQXS 

l=l|:||llllll==llll=ll==ll=lll =11=1111 =11 = =1 I 11111111=111111111=1 I =1 
IlAQVLGAMFGQLLIVMVYRPYYLKTQNPNAILGTFSTIDl^DNSEKTRLGATINGFLMIFLGSFVLFFGAVAATNIFF 
100 110 120 130 140 150 160 



25 831 861 885 906 936 966 996 1026 

G VELVGKLVQAGYDQTTA- -ATRISPYVTG SLAVAHLGIGFLVMTLVASLGGPTGPALNPARDLGPRIVHRLL 

I = I I I : = = =1 =1 I HI I =11111 II =1111111 111111 = 111 = 11 II 
GSQSITWMTNYLKGQGADVSSSD\/MNQIWQASGASASKMIAHLFLGFLVMGLWALGGPTGPGTiNPARDB*GPRLVH,SIi^ 

180 190 200 210 220 230 240 

30 

1056 1086 1116 1146 1176 1206 1236 1266 

PKQILGQAKEDSKWWYAWVPVLAPIVASILAVALFKLLYL* *LKKDRFTGLFLF* I*KSASLAS* FRLMMTFGHSFFKGR 

II .||:|| lllllllllllllbih llllll==ll 
PKSVLGFAKGSSKWWYAWVPVLAPILASLAAVALFKMIYIi 

35 260 270 280 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1622 

40 A DNA sequence (GBSxl717) was identified in S.agalactiae <SEQ ID 5007> which encodes the amino 
acid sequence <SEQ ID 5008>. Analysis of this protein sequence reveals the following: 



45 



50 



Possible site. 














»> Seems to have an uncleavable N-term signal seq 










INTEGRAL 


Likelihood = -8.70 


Transmembrane 


266 


- 282 


262 


- 290) 


INTEGRAL 


Likelihood = -7.96 


Transmembrane 


25 


- 41 


24 


- 50) 


INTEGRAL 


Likelihood = -6.42 




110 


- 126 


105 




INTEGRAL 


Likelihood = -6.26 


Transmembrane 


194 




190 


- 215) 


INTEGRAL 


Likelihood = -5.47 


Transmembrane 


290 


- 306 


289 


- 310) 


INTEGRAL 


Likelihood = -4.35 


Transmembrane 


128 


- 144 


127 


147) 


INTEGRAL 


Likelihood = -3.29 


Transmembrane 


157 


- 173 




- 174) 


INTEGRAL 


Likelihood = -2.76 


Transmembrane 


221 


- 237 


221 


- 240) 


Final Results 














bacterial membrane 


-- Certainty=0. 




Affirmative) 





55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related sequence was also identified in GAS <SEQ ID 9177> which encodes the amino acid sequence 
60 <SEQ ID 9178>. Analysis of this protein sequence reveals the following: 
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Possible cleavage site: 21 












> Seems to have a cleavable N-term signal seq. 










INTEGRAL 


Likelihood =-10.77 


Transmembrane 


13 9 


155 


133 


- 161) 


INTEGRAL 


Likelihood = -8.28 


Transmembrane 


245 


261 


240 


- 269) 


INTEGRAL 


Likelihood = -7.48 


Transmembrane 


269 


285 


263 


- 289) 


INTEGRAL 


Likelihood = -7.06 


Transmembrane 


97 




83 


- 125) 


INTEGRAL 


Likelihood = -6.10 


Transmembrane 


173 


189 


169 


- 194) 


INTEGRAL 


Likelihood = -1.44 


Transmembrane 


200 


216 


200 


- 217) 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 531 (Affirmative) • 

- Certainty=0. 0000 (Not Clear) < 

- Certainty=0. 0000 (Not Clear) < 



15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 225/301 (74%) , Positives = 263/301 (86%) 

Query: 10 LTVSLFFCRLDIMNETLLLHGIQLILIIAMIITFYQIVRHIRSQKINPFKRFFTGLWIGF 69 
LT +FFC+L MNE L+L IQ +L+ AM+ F+ +V+H++ KINPFKRF+TG WIG 
20 Sbjct: 1 LTAKVFFCKLVFMNEMLILRLIQALLVSAMLFIFFMLVKHLKKNKINPFKRFWTGFWIGL 60 

Query: 70 OTDALDTLGIGSFATTTTFFKLTKLVEDDRKIPATMTAAHVLPVLLQSLCFIFWKVEAL 129 

+TDALDTLGIGS FATTTT FKLTKLV DDR++P TMT AHVLPVL+QSLCFIFWKVE L 
Sbjct: 61 LTDALDTLGIGSFATTTTCFKLTKLVTDDRQLPGTMTVAHVLPVLIQSLCFIFVVKVEVL 120 

25 

Query: 130 TLITmGAAFIGAWGAKMTKNWHAPTVQRILGTLLITAAIIMLYRMITNPGAGISDSVH 189 

TL+ MA AAFIGA+ G +TKNWHAPTVQRILG+LLI AAIIM+ R+I +PG +SD++H 
Sbjct: 121 TLLAMAAAAFIGAYFGTHITKNWHAPTVQRILGSBLIIAAIIMIIRIIYHPGEHLSDTIH 180 

30 Query: 190 GLHGIWLFVGIGFNFIIGVLMTMGLGNYAPEXIFFSLMGLSPAvAMPVMMLDAAMIMTAS 249 

GLHGIWLFVGIGFNFI +GVLMTMGLGNYAPELIFFSLMGLSP VAMPVMMLDAAMIMTAS 
Sbjct: 181 GLHGIWLFVGIGFNFIVGVLMTMGLGNYAPELIFFSLMGLSPTV7AMP\M»ILDAAMIMTAS 240 

Query: 250 STQFIKSGRVNWGFAGLVTGGILGVIVAVLFLTNLDI^SLKTLWGIVLFTGAMLIRSSF 310 
35 S+QFIK+ RV+W+GFAG+V+GGI +GV++AV FLTNLD+NSLK LV+ IV FTG MLIRSSF 

Sbjct: 241 SSQFIKANRVSWDGFAGIVSGGIIGVLLAVFFLTNLDINSLKLLVIAIVFFTGGMLIRSSF 301 



A related GBS gene <SEQ ID 8867> and protein <SEQ ID 8868> were also identified. Analysis of tl 
protein sequence reveals the following: 



Crend: 



-5.59 



Lipop: Possible site: -1 
McG: Discrim Score: 
GvH: Signal Score (-7.5) 

Possible site: 44 
»> Seems to have an uncleavable H- 
ALOM program count: 8 value: -8. 
INTEGRAL Likelihood = -8.70 
INTEGRAL Likelihood = -7.96 
INTEGRAL Likelihood = -6.42 
INTEGRAL Likelihood = -6.26 
INTEGRAL Likelihood = -5.47 
INTEGRAL Likelihood = -4.35 
INTEGRAL Likelihood = -3.29 
INTEGRAL Likelihood = -2.76 
87 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



■ Reasoning Step: 3 



■— Certainty=0. 4482 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5009> which encodes amino acid sequence 
<SEQID5010>: 



Possible site: 33 
>>> Seems to have 

INTEGRAL 

INTEGRAL 



3 N-te 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



lal signal sequence 

Transmembrane 151 - 



= -0 



257 - 273 
281 - 297 
Transmembrane 109 - 125 



44 Transmembrane 212 - 228 



27 



21 



Final Results 

bacterial membrane Certainty=0 . 5310 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS sequences follows: 

L5 

= 228/301 (74%) 

Query: 1 LTAKVPFCKLVFMNEMLILRLIQALLVSAI1LFIFFKLVKHLKKNKINPFKRFWTGFWIGL 60 

LT +FFC+L MNE L+L IQ +L+ AM+ F+ +V+H++ KINPFKRF+TG WIG 
Sbjct: 10 LTVSLFFCRLDIMNETLLLHGIQLILI IAMI ITFYQIVRHIRSQKINPFKRFFTGLWIGF 69 

Query: 61 LTDALDTLGIGSFATTTTCFKLTKLOTDDRQLPGTMTVAHVLPVLIQSLCFIFWKVEVX 120 

+TDALDTLGIGSFATTTT FKLTKLV DDR++P TMT AHVLPVL+QSLCFIFWKVE 
Sbjct: 70 VTDALDTLGIGSFATTTTFFKLTKLVEDDRKIPATMTAAHVLPVLLQSLCFIFWKVEAL 129 

Query: 121 XXXXXXXXXFIGAYFGTHITKNl^PTVQRILGSIiLXXXXXXXXXXXXyHPGEHLSDTIH 180 

FIGA+ G +TKNWHAPTVQRILG+LL +PG +SD++H 

Sbjct: 130 TLITMAGAAFIGAFVGAKMTKNWHAPTVQRILGTLLITAAIIMLYRMITNPGAGISDSVH 189 

Query: 181 GLHGIWLFVGIGFNFIVGVLMTMGLGNYAPELIFFSLMGLSPTVAMPVMMLDAAMIMTAS 240 

GLHGIWLFVGIGFNFI+GVLMTMGLGNYAPELIFFSLMGLSP VAMPVMMLDAAMIMTAS 
Sbjct: 190 GLHGIWLFVGIGFNFIIGVLMTMGLGWYAPELIFFSLMGLSPAVAMPVMMLDAAMIMTAS 249 

Query: 241 SSQFIKAM^VSWDXXXXXKXXXXXXXXXXXFFLTKLDINSLKLLVIAIVFFTGGMLIRSSF 301 

S+QFIK+ RV+W+ FLTNLD+NSLK LV+ IV FTG MLIRSSF 

Sbjct: 250 STQFIKSGRVIMStGFAGLVTGGILGVIVAVLFLTKLDLWSLKTLWGIVLFTGAMLIRSSF 310 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 



vaccines or 



Example 1623 

A DNA sequence (GBSxl718) was identified in S.agalactiae <SEQ ID 501 1> which encodes the amino 
acid sequence <SEQ ID 5012>. This protein is predicted to be C3-degrading proteinase. Analysis of this 
protein sequence reveals the following: 



i N-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



■ Certainty=0. 2851 (Affirmative) < succ; 

• Certainty=0. 0000 (Not Clear) < suco 

• Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD37110 GB:AF112358 C3 -degrading proteinase [Streptococcus pneumoniae] 



WO 02/34771 



PCT/GB01/04789 



-1814- 

Identities = 92/240 (38%) , Positives = 142/240 (58%) , Gaps = 11/240 (4%) 

Query: 12 FVIiRvNNRDIiNIAFYQESIjGFKLISEENMAVFSAWQHKERSFIIEESPTYRTRAVNGTK 71 

P L+ NNR LN FY E+LG K + EE+A E ++EE+P+ RTR V G K 

Sbjct: 11 PTLKRNNRKLNETFYIETLGMKALLEESAFLSLGDQTGLE - KLVLEEAPSMRTRKVEGRK 69 

Query: 72 KLAKIIVKSQDAKDIEKLLANGAQAIQWQGQNGYAYETVSPEGDLFLLHAEDDLSQLVA 131 

KLA++IVK ++ +IE +L+ ++Y+GQNGYA-E SPE DL L+HAEDD++ LV 

Sbjct: 70 KLARLIVKVENPLEIEGILSKTDS1HRLYKGQNGYAFEIFSPEDDLILIHAEDDIASLVE 129 

Query: 132 I-ERPELEKKDDTTGLSNFAFQSISLWPDAVKAEAFYDKVFAGKFPINLSFKEAQGQDL 190 

+ E+PE + + LS F S+ L++P + E+F + + + +L F AQGQDL 
Sbjct: 130 VGEKPEFQTDLASISLSKFEI-SMELHLPTDI- -ESFLE SSEIGASLDFIPAQGQDL 183 

Query: 191 QIAPIffiTWDIEILECOTNEDTNLNDLKSTFESLGLDVYLDSKEKILVlSDTSNIEIWISK 250 

+ TWD+ +L+ VNE ++ L+ FES + ++ EK + D +N+E+W + 
Sbjct: 184 TVDOTVTWJLSMLKFLVlffi-LDIASLRQKFES- -TEYFIPKSEKFFLGIQDRNNVELWFEE 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5013> which encodes the amino acid 
sequence <SEQ ID 5014>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3267 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/250 (52%), Positives = 177/250 (70%) 

Query: 1 MTLFHSLTFKHPA7LRVNNRDLNIAFYQESI/3FKLISEENAIAVFSAWQNKEASFIIEESP 60 

MTL ++TFK PVLRVN+RDLNIAFYQ +1X3 +L+SEENAIA+FS+W + F+IEESP 
Sbjct: 1 MTLMENITFKTPVLRVNDRDLNIAFYQNNLGLRLVSEENAIAIFSSVIGEGQECFV1EESP 60 

Query: 61 TYRTPAWGTKKIAKIIvKSQDAKDIEKLLANGAQAIQVYQGQNGYAYETVSPEGDLFLL 120 

+ RTRAV G KK+ I++K+ K+IE+LLA+GA +++GQNGYA+ET+SPEGD FLL 
Sbjct: 61 SVRTRAVEGPKKVNTIVIKTNQPKEIEQLLAHGAHYDALFKGQNGYAFETISPEGDRFLL 120 

Query: 121 HAEDDLSQLVAIERPELEKKDDTTGLSNFAFQSISIWPDAVKAEAFYDKVFAGKFPINL 180 

HAE D+ L + P LEK GL+ F F I IjNV +++AFY +F+ + PI + 

Sbjct: 121 HAEQDIKHLQGTDLPSLEKDATFKGLTQFKFDIIVLNVISEERSKAFYRDLFSDQLPITM 180 

Query: 181 SFKEAQGQDLQIAPNETWDIEILECCVNEDTNLNDLKSTFESLGLDVYLDSKEKILVISD 240 

F + +G DL I P+ WD+EILE V++D ++ LK+T E G VY+D K K+LV+SD 
Sbjct: 181 DFIQEEGPDIAIDPHIATOLEILEFQVSKDYDMKVLKATLEEDGHKVYIDKKHKVLVLSD 240 

Query: 241 TSNIEIWISK 250 

S IE+W +K 
Sbjct: 241 PSQIEVWFTK 250 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1624 

A DNA sequence (GBSxl719) was identified in S.agalactiae <SEQ ID 5015> which encodes the amino 
acid sequence <SEQ ID 5016>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 
■ Final Results 
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bacterial cytoplasm Certainty=0 . 2510 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC16441 GB:AL450165 putative esterase [Streptomyces coelicolor] 
Identities = 89/323 (27%), Positives = 143/323 (43%), Gaps = 51/323 (15%) 

Query: 10 NTVLEL1KEQIKDNLYHGASLAIY-ENGEWHEHYLGT IDGNEKVKAGLVYDLA 61 

+T+ EL+ E + + GA+ ++ G + GT +DG4+ V+DLA 
Sbjct: 2 STLAELLAEGREQRICSGAAWSVGGPQGPLDRGKTGTRCWDGPPLDGDD VWDLA 55 

Query: 62 SVSKWGVGTLLAKLVYQGTIDIDKPLRYYYPTFH HQTLTVRQLATHSSGIDPFIP- 117 

SV+K + G ++ LV +G + 4D + Y P + LTVRQL H+SGI +P 

Sbjct: 56 SVTKPIA-GLVVM^VERGALGLDDTVGGYLPDYRGGDKAELTVRQLLAHTSGIPGQVPL 114 

Query: 118 NRDQLNATQLKDAINHIKVLEDKSFK- -YTDINFLLLGFMLEEVLGDSLDKLFKRYIFTP 175 

RD L +A4- + + + Y+ F4+LG + E G+ L+ L +R + P 

Sbjct: 115 YRDHPTPAALLEAWLLPLTAQPGTRVQYSSQGFIVLGLIAEAAAGEPLEALVERLVCAP 174 

Query: 176 FQMKETSFGPRVEAVPTWGIND GIVHDPKAKVLGKHTGSAGLFSTIDDLQ 226 

+++T F P V D G VHD A VLG G AGLFST+ D++ 

Sbjct: 175 LGLRDTVFRPDAGRRARAVATEDCPWRGRRWGEVHDENAWLGGVGGHAGLFSTLADME 234 

Query: 227 RFSIHYL KDDFA-KPLWNNYSLSKSRSLAWD IDKDWINHT 265 

R + FA + L+ R+LAW + HT 

Query: 266 GYTGPFIALNYQKQAAAIFLTNR 288 

G+TG + ++ + A+ LTNR 
Sbjct: 295 GFTGTSLWVDPATRRYAVLLTNR 317 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3885> which encodes the amino acid 
sequence <SEQ ID 3886>. Analysis of this protein sequence reveals the following: 

Possible site: 28 



Final Results 

bacterial membrane Certainty=0. 1532 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 174/302 (57%), Positives = 229/302 (75%), Gaps = 1/302 (0%) 

Query: 9 TNTVLELIKEQIKDNLYHGASIAIY3NGEWKEHYLGTIDGNEKVKAGLVYDIASVSKWG 68 

T V++ 1+ +■ +Y GASLA++++G W E+++GTIDG V A LVYDLASVSKWG 
Sbjct: 6 TLAVIKCIENHLHKKVYKGASLALFQSGRWQEYHIGTIDGRRPVDANLVYDLASVSKVVG 65 

Query: 69 VGTLIJUCLWQGTIDIDKPLRYYYPTFHHQTLTVRQLATHSSGIDPFIPNRDQLNATQLK 128 

V T+ L+ GT+ +D PL+ YYP+ T+T+RQIi TH+SG+DP+IPNRD LNA QL+ 
Sbjct: 66 VATICNILL1OTGTLALDDPLKVYYPSIADATVTIRQLLTHTSGLDPYIPNRDVLNAQQLR 125 

Query: 129 DAINHIKVLEDKSFKYTDINFLLI^FMLEEVIiGDSLDKLFKRYIFTPFQMKETSFGPRVE 188 

A+NH+ E+K+F YTD+HFLLLGFMLEE+ +SLD++F + 1FTPF M TSFGPR E 
Sbjct: 126 KABNHLTQKENKNFYYTDWFLLLGFMLEELFSESLDQIFDKTIFTPFGMYHTSFGPRPE 185 

Query: 189 AVPTWGINDGIVHDPICAKVLGKHTGSAGLFSTIDDLQRFSIHYLKDDFAKPLWNMYSLS 248 

AVPT+ G++DG VHDPKAK+L KH+GSAGLFST+ DL+ FS HYL D F+ LW NYS 
Sbjct: 186 AVPTLKGVSDGEVHDPKAKILIGCHSGSAGLFSTLADI^SFSNHYLNDPFSDCLWENYSQQ 245 



Query: 249 K- SRSLAWDIDKDWINHTGYTGPFI ALNYQKQAAAI FLTNRTFSYDDRPLWIKKRRHVQE 307 
RSL W++D DWI +HTGYTGPF+ LN ++Q AAIFLTNRT+ DD+ W+K+R+ + 
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Sbjct: 246 TIERS^V^LDGDWISHTGYTGPFLMimKEX^^IFLTKRTYDEDDKSIOTLKERQLLYN 305 

Query: 308 AI 309 
A+ 

Sbjct: 306 AL 307 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1625 

A DNA sequence (GBSxl720) was identified in S.agalactiae <SEQ ID 5017> which encodes the amino 
acid sequence <SEQ ID 501 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA25177 GB:D21804 FMN-binding protein [Desulf ovibrio vulgaris] 
Identities = 53/124 (42%), Positives = 76/124 (60%), Gaps = 2/124 (1%) 

ML F +VLK EGW+I + E PH+ NTWKSYL + D RI+ P GM E ++ 
Sbjct: 1 MLPGTFFEVIjKNEGWaiATQGEDGPHLVNT(WSYLK\^DGIITOIWPVGGMHKTEANVAR 60 

Query: 61 NSKIIMTLGSREVEGRDGyQGTGFRIEGTAKLLEAGSDFEIVKEKYPFLRKVLEVTPINV 120 

+ +++MTLGSR+V GR+G GTGF I G+A G +FE + ++ + R L +T ++ 

Sbjct: 61 DERVLMTLGSRKVAGRNG-PGTGFLIRGSAAFRTDGPEFEAI-ARFKWARAALVI1WSA 118 

Query: 121 IQLL 124 
Q L 

Sbjct: 119 EQTL 122 

No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1626 

A DNA sequence (GBSxl721) was identified in S.agalactiae <SEQ ID 5019> which encodes the amino 
acid sequence <SEQ ID 5020>. Analysis of this protein sequence reveals the following: 

a N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3799 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1627 

A DNA sequence (GBSxl722) was identified in S.agalactiae <SEQ ID 5021> which encodes the amino 
acid sequence <SEQ ID 5022>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3175 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0D00 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10123> which encodes amino acid sequence <SEQ ID 
101 24> was also identified. 

The protein has homology to a pyruvate formate-lyase from S.mutans: 



MATVKTNTDIFEQAWEGFKGVDWKEKASIARFVQANYAPYDGDESFLAGATERSLHIKKV 66 
MATVKTNTD+FE+AWEGFKG DWK+-ASI+RFVQ NY PYDG ESFLAG TERSLHIKKV 
MATVKTNTDVFEKAWEGFKGTDWKDRASISRFVQDNYTPYDGGESFLAGPTERSLHIKKV 6 0 



+EETKMIYEETRFPMDTR+ SI++ + PAG+IDK+NELIFGIQNDELFKLNFMPKGGIRMAE 



Query: 


? 


Sb j ct : 


1 


Query: 


67 


Sbjct: 


61 




127 


Sbjct: 


121 




187 


Sbjct: 


181 


Query: 


247 


Sbj ct: 


241 




307 


Sb j ct : 


301 




367 


Sbj ct : 


361 




427 


Sbj ct : 


421 




487 


Sbj ct : 


481 




547 


Sbjct: 


541 




607 



VYARLA+ YGADYLMQEKVNDWN+ + +IDEESIRLREEINLQYQALGEW+LGDLYG+DVR 



+PEPNLTVLWS +LPY+FR YCMSMSHKHSSIQYEGV+TMAKEGYGEMSCISCCVSPLDP 



FJSIED+RHNLQYFGARVNV+KALLTGIiNGGYDDWKDYKVFD++PIRDEVL+F+TVKANFE 



K+LDWLTDTYVDAMNIIHYMTDKYNYEAVQMAFLP+ V+ANMGFGICGF+NTVDSLSAIK 



YATVKPIRDEDGYIYDYETVG+FPRYGEDDDRVDS IAEWLLEAFH RLA+HKLYKD+EAT 
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VSLLTITSIWAYSKQTGNSPVHKGVYMEDGSVlJLSKVEFFSPGANPSNKa GGWLQNLN 
Sbjct: 601 VSLLTITSNVAYSKQTGNSPVHKGVYIJJEDGSVNLSKVEFFSPGSUSIPSNKASGGWLQNIiN 660 

Query: 667 SLSKLDFAHMDGISLTTQVSPRALGKTFDEQVDNLVTVLDGYFENGGQHVKLNVMDLKD 726 
5 ' SL KLDFAHANDGISLTTQVSP4ALGKTFDEQV NLVT+LDGYFE GGQHVNLNVMDLKD 

Sbjct: 661 SLKKLDFAEIMTOGISLTTQVSPKMiGKTFDEQVANLVTILDGYFEGGGQHVNLNVMDLKD 720 

Query: 727 VYDKIMNGEDVIVRISGYCVNTKYLTPEQKTELTQRVFHEVLSMDDALTN 776 
VYDKIMNGEDVIVRISGYCVNTICYLT EQKTELTQRVFHEVLSMDDA T+ 
10 Sbjct: 721 VYDKIMNGEDVIWISGYCWTKYLTKEQKTELTQRVFHEVLSMDDAATD 770 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5023> which encodes the amino acid 
sequence <SEQ ID 5024>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3184 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 701/773 (90%) , Positives = 742/773 (95%) , Gaps = 1/773 (0%) 

Query: 2 FKEKTMATVKTNTDIFEQAWEGFKGVDWKEKASIARFVQANYAPYDGDESFLAGATERSL 61 

FKEK MATVKTNTD+FE+AWEGFKG DWKEKAS++RFVQANY PYDGDESFLAGATERSL 
Sbjct: 5 FKEKFMATVTCTNTDVFEKAWEGFKGTDWKEKASVSRFVQANYTPYDGDESFLAGATERSL 64 

Query: 62 HIKKVIEETICAHYEETRFPMDTRVASISELPAGFIDKDNEL1FGIQNDELFKLNFMPKGG 121 

HI KKVTEETKAHYE TRFP DTR SI+++PAGFIDK+NELI+GIQNDELFKI1NFMPKGG 
Sbjct: 65 HIKKVIEETKMTyEATRFPYDTRPTSlADIPAGFIDK3NELIYGIQNDELFKIWFMPKGG 124 

Query: 122 IRMAETTLKENGYEPDPAVHEIFTKYATTVNDGIFRAYTSNIRRARHAHTVTGLPDAYSR 181 

IRMAETTLKENGYEPDPAVHEIFTKY TTVNDGIFRAYTSNIRRARHAHTVTGLPDAYSR 
Sbjct: 125 IFJ'^AETTLKENGYEPDPAvHEIFTKY^/TTvNDGIFRAYTSNIRRARHAHTVTGLPDAYSR 184 

Query: 182 GRIIGWARLAvYGADYLMQEKVNDWNAIiIC>IDEESIRLREEINLQYQALGEVVKLGDLY 241 

GRI I GVYARLA+ YGAD YLMQE KVNDWNA+ +IDEESIRIiREE+NLQYQALGEWKLGDLY 
Sbjct: 185 GRIIGWARLALYGADYLMQEKVNDTOAITEIDEESIRLREEVNLQYQALGEVVKLGDLY 244 

Query: 242 GVDWKPAMNTKEAIQPAttJIAFMAVCRVINGAATSLGRVPIVLDIFAERDLARGTFTESE 301 

GVDVR+PA N KEAIQWVNIAFMAVCRVINGAATSLGRVPIVLDIFAERDLARGTFTESE 
Sbjct: 245 GVDVRRPAQNVKEAIQVJvNIAFMAVCRVINGAATSLGRVPIVLDIFAERDLARGTFTESE 304 

Query: 302 IQEFVDDFVLKLRWKFARTKAYDALYSGDPTFITTSMAGMGADGRHROTKMDYRFLNTL 361 

IQEFVDDFVLKLRTVKF RTKAYDALYSGDPTFITTSMAGMG DGRHRVTKMDYRFLNTL 
Sbjct: 305 IQEFVDDFVLKXRWKFGRTKAYDALYSGDPTFITTSMAGMGNDGRHRVTKNDYRFIiHTL 364 

Query: 362 DNIGNSPEPNLTVLWSDQLPYAFRRYCMSMSHKHSSIQYEGVSTMAKEGYGEMSCISCCV 421 

DNIGNSPEPNLTVLW+DQLP FRRYCM MSHIvHSSIQYEGV+TMAKEGYGEMSCISCCV 
Sbjct: 365 DNIGNSPEPNLTVLm?DQLPETFRRYCMKHSHKHSSIQYEGVTTMAKEGYGEMSCISCCV 424 

Query: 422 SPLDPENEDKRHl^QYFGARvNWiKALLTGMGGYDDVHKDYKVFD-IDPIRDEvLNFDT 480 

SPLDPENE++RHN+QYFGARVNV+KALLTGLNGGYDDVH+DYKVF+ ++PI EVL +D 
Sbjct: 425 SPLDPENEEQRHNIQYFGARVNVLKALLTGINGGYDDVHRDYKVFNVVEPITSEvLEYDE 484 

Query: 481 VKANFEKSLDWLTDTYVDAMNIIHYMTDKYNYEAVQf^^ 540 

V ANFEKSLDWLTDTYVDA+NI IHYMTDKYNYEAVQMAFLP+H PANMGFGI CGFANTVD 
Sbjct: 485 VMANFEKSLDWLTDTYVDALNI IHYMTDKYNYEAVQMAFIiPTHQRANMGFGI CGFANTVD 544 

Query: 541 SLSAIKYATVKPIRDEDGYIYDYETVGDFPRYG3DDDRVDSIAEWLLEAFHGRLAKHKLY 600 

+LSAI KYATVK IRDE+GYIYDYE GDFPRY3EDDDRVD IA+WL+EA+H RIA HKLY 
Sbjct: 545 TLSAIiCfATVKTIRDENGYIYDYEVTGDFPRYGEDDDRVDDIAKWLMEAYHTRLASHKLY 604 

Query: 601 KDAFATOSLLTITSNVAYSKQTGNSPVHKGVYIJSIEDGSvNLSKVEFFSPGANPSNKAKGG 660 
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K+ASA+VSLLTITSNVRYSKQfTGNSPVH+GV+USIEDG+VN S+VEFFSPGANPSNKARGG 
Sbjct: 605 KWAEMVSLLTITSNVAYSKQTGNSPVHRGVFD^DGTVNTSQVEFFSPGANPSNKAKGG 664 

Query: 661 WLQNI^SLSKLDFAHAM3GISLTTQVSPIUULGKTFDEQVDNLVTVLDGYFENGGQHVNI^ 720 

WLQNI^SL+KL+F+HANDGISLTTQVSPRALGKTFDEQVD^ 
Sbjct: 665 WLQNDNSLAKIjEFSHAMDGISLTTQVSPRM^KTFDEQVDNLVTVIiDGYFENGGQHWLN 724 

Query: 721 VMDLKDVYDKIMNGEDVIVRISGYCVKTKYLTPEQKTELTQRVFHEVLSMDDA 773 

VMDL DVYDKIMNGEDVIVRISGYCVWTKYLTPEQKTELTQRVFHEVLSMDDA 
Sbjct: 725 VMDI^VYDKIMNGEDVIVRISGYCVNTKYLTPEQKTELTQRVFHEVLSMDDA 777 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1628 

A DNA sequence (GBSxl723) was identified in S.agalactiae <SEQ ID 5025> which encodes the amino 
acid sequence <SEQ ID 5026>. This protein is predicted to be DNA-damage inducible protein P (dinP). 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 2 1> which encodes amino acid sequence <SEQ ID 
1 01 22> was also identified. ' 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 12 INDTSRKIIHIDMDAFFASVEERDNPSLKGKPVIIGSDPRKTGGRGWSTCNYEARKFGV 71 

+ D RKIIH+DMD FFA+VE RDNP+ 4- + +G ++ RGV+STCNY+ARKFGV 
Sbjct: 1 MQDRIRKIIHVDMDCFFAAVEMRDNPAYREIALAVGGHEKQ RGVISTCNYQARKFGV 57 

Query: 72 HSAMSSKEAYERCPQAIFISGNYQKYRQVGMEVRDIFKKYTDLVEPMSIDEAYLDVTENK 131 

SAM + +A + CPQ + G Y+ V +++ IF++YT L+EP+S+DEAYLDV+E+ 
Sbjct: 58 RSAMPTAQALKLCPQLHWPGRMSVYKSVSQQIQTIFQRYTSLIEPLSLDEAYLDVSEST 117 

Query: 132 MGIKSAVKLMMIQYDIWMDVHLTCSAGISYNKFLAKLASDFEKPKGLTLILPDQAQDFL 191 

SA +A+ 1+ DIW +++LT SAG++ KFLAK+ASD KP GL ++ PD+ Q+ + 
Sbjct: 118 AYQGSATLIAQAIRRDIWQEMLTASAGVAPIKFIAKVASDLNKPDGLYVVTPDKVQEMV 177 

Query: 192 KPLPIEKFHGVGKRSVEKLHALGVYTGEDLLSLSEISLIDMFGRFGYDLYRKARGINASP 251 

LP+EK GVGK ++EKLH G+Y G D+ L+ FGR G L++K+ GI+ 

Sbjct: 178 DSLPLEKIPGVGKVALEKLHQAGLYVGADVRRADYRKLLHQFGRLGASLWKKSHGIDERE 237 

Query: 252 WPDRWKSIGSEKTYGKLLYNEADII<AEISI<N\ r QRWASLEI<NKKVGKTIV---LKVRY 308 

V +R RKS+G ET+++ + I++ ++ +1+ +KV++ 

Sbjct: 238 WTERERKSVGVEYTFSQNISTFQECWQVIEQKLYPELDARLSRAHPQRGIIKQGIKVKF 297 

Query: 309 ADFETLTKRMTLEEYTQDF--QIIDQVAKAIFDTLEESVFGIRLLGVTV 355 

ADF+ T D+ ++++QV + IRLLG++V 

Sbjct: 298 ADFQQTTIEHVHPALELDYFHELLEQV LTRQQGREIRLLGLSV 340 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5027> which encodes the amino acid 
sequence <SEQ ID 5028>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
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>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1921 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0- 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 276/363 (76%) , Positives = 323/363 (88%) 

Query: 6 MLIFPLINDTSRKIIHIDMDAFFASVEERDNPSLKGKPVIIGSDPRKTGGRGWSTCNyE 65 

ML I FPL INDTSRKI IH IDMDAFFA+ VEERDNP + LKGKPV+ IG DPR+TGGRGWSTCNYE 
Sbjct: 1 MLIFPLINDTSRKIIHIDMDAFFAAVEERDNPALKGKPWIGKDPRETGGRGWSTCNYE 60 

Query: 66 ARKFGVHSAMSSKEAYERCPQAIFISGNYQKYRQVGMEVRDIFKKYTDLVEPMSIDEAyL 125 

ARK+G+HSAMSSKEAYERCP+AIFISGNY+KYR VG ++R I FK4- YTD+ VEPMS I DEAYL 
Sbjct': 61 ARKYGIHSAMSSKEAYERCPKAIFISGNYEKYRTVGDQIRRIFKRYTDWEPMSIDEAyL 120 

Query: 126 DVTENKMGIKSAVKLAmiQYDIVffirovHLTCSAGISYNKFLAKLASDFEKPKGLTLILPD 185 

DVT+NK+GIKSAVK+AK+IQ+DIW +V LTCSAG+SYNKFLAKLASDFEKP GLTL+L + . 
Sbjct: 121 DVTDNIOiGIKSAVKIAKLIQHDIWKEVGLTCSAGVSYNKFLAKLASDFEKPHGLTLVLICE 180 

Query: 186 QAQDFLKPLPIEKFHGVGKRSVEKLHALGVYTGEDLLSLSEISLIDMFGRFGYDLYRKAR 245 

A FL LPIEKFHGVGK+ SV+KLH +G+YTG+DLL++ E++LID FGRFG+DLYRKAR 
Sbjct: 181 DALCFLAKLPIEKFHGVGKKSVKKLHDMGIYTGQDLLAVPEMTLIDHFGRFGFDLYRKAR 240 

Query: 246 GINASPVKPDRWK3IGSEKTYGKLLYNEADIKAEISKNVQRWASLEKNKKVGKTIVLK 305 

GI+ SPVK DR+RKS IGSE+TY KLLY E DIKAEISKNV+RV A L+ +KK+GKTIVLK 
Sbjct: 241 GISNSPVKYDRIRKSIGSERTYAKIiLYQETDIKAEISKNVKRVAALLQDHKKLGKTIVLK 300 

Query: 306 WYADFETLTKRMTLEEYTQDFQI1DQVAKAIFDTLEESVFGIRLLGVTVTTLENEHEAI 365 

VRYADF TLTKR+TL E T++ I+QVA IFD+L E+ GIRLLGVT+T LE++ I 
Sbjct: 301 VRYADFTTLTKRVTLPELTRNARQIEQVAGDIFDSLSENPAGIRLLGVTMTNLEDKVADI 360 

Query: 366 YLD 368 
LD 

Sbjct: 361 SLD 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1629 

A DNA sequence (GBSxl724) was identified in S.agalactiae <SEQ ID 5029> which encodes the amino 
acid sequence <SEQ ID 5030>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have an uncleavable N-terra signal seq 

INTEGRAL Likelihood =-13.11 Transmembrane 70 - 86 ( 58 - 92) 

INTEGRAL Likelihood = -5.20 Transmembrane 105 - 121 ( 100 - 123) 

INTEGRAL Likelihood = -4.25 Transmembrane 126 - 142 ( 123 - 144) 

INTEGRAL Likelihood = -2.71 Transmembrane 18 - 34 ( 18 - 34) 



Final Results 

bacterial membrane Certainty=0. 6243 (Affirmative) < suco 

bacterial outside --- Certainty= 0.0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 503 1> which encodes the amino acid 
sequence <SEQ ID 5032>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
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Final Results 

bacterial membrane Certainty=0. 6201 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 57/155 (36%) , Positives = 96/155 (61%) , Gaps = 5/155 (3%) 

Query: 1 MVSYEKVRRSLRTATITIIVLNSLSLVFRLFTGISVQLAKTEI-NKGNTGNLPKEHIEAV 59 

M+SYEKVR+4L+T+TI II+LN L +V L + ++++ N+ L E + + 

Sbjct: 1 MI S YEKVRQALKTST I AI I ILNGLGWLS LMG FAG I FYLQSQLKNEAFRAQLTTEQLAQL 60 

Query: 60 LSATTPFMLFVTALIVLWIAIVIFCIKMjRAIKRNQTVNYLPYYLGFAITVGLVILGFL 119 

S+ TPFM+F++ L VL IAI++FC +NL +K+ TV+Y+PY LG ++V ++ F 
Sbjct: 61 QSSMTPFMIFLSVLNVLAIIAI IVFCAQNLSKLKQGLTVSYIPYILGLILSVIGLVNQFT 120 

Query: 120 TTKAPWAIAINI VFQAI FGLLYFHAYQKAQKLNER 154 

TT + + ++ A++G A+ KA+ LNE+ 

Sbjct: 121 TTMSMVGTILILIQAALYGF AFYKAKTLNEK 151 

SEQ ID 5030 (GBS227) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 119 (lane 5; MW 21.2kDa). 

GBS227-His was purified as shown in Figure 227, lane 8-9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1630 

A DNA sequence (GBSxl725) was identified in S.agalactiae <SEQ ID 5033> which encodes the amino 
acid sequence <SEQ ID 5034>. Analysis of this protein sequence reveals the following: 

^- terminal signal sequence 



Final Results 

40 bacterial cytoplasm Certainty=0 . 1224 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:CAB14706 GB:Z99118 similar to conjugation transfer protein 

[Bacillus subtilis] 
Identities = 328/754 (43%) , Positives = 484/754 (63%) , Gaps = 25/754 (3%) 

Query: 2 EVFFTGTIERI IFENASNFFKILLLEIEDTDSDFDDVEVI ITGTMADVIEGEEYTFWGTL 61 
50 E + GT+ +1+ N +N + +L +++ +T +D V +TG + E E YTF+G + 

Sbjct: 13 EPYLKGTVNTVIYHNDTNLYTVLKVKVTETSEAIEDKAVSVTGYFPALQEEETYTFYGKI 72 



Query: 62 TQHPKYGEQLQSVRYERAKPTSG-GLVKYFSSEQFKGIGKKTAQRIVELYGDNTIDKILE 120 
HPK+G Q Q+ +++ PT+ G+++Y SS+ F+GIGKKTA+ IV+ GD+ I+KIL 
55 Sbjct: 73 VTHPKFGLQFQAEHFKKEIPTTKEGIIQYLSSDLFEGIGKKTAEEIVKKLGDSAINKILA 132 
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Query: 121 SPEQLSTISGLSKINREaFIAKLKLNYGTEQVIAKLAEYGLSNRmiQIFDHYKEESLEV 180 

L + hSK + L+ + G EQ++ L ++G + Y+ E+LE 

Sbjct: 133 DASVLYDVPRLSKKKADTLAGALQRHQGtEQIMISLNQPGFGPQLSMKIYQAYESETLEK 192 

Query: 181 INENPYQLVEDIQGIGFKIADQLAEQVGIESDSPKRFRAAIIHTLVESSMEQGDTYIEAR 240 

I ENPYQLV+D++GIGF AD+L ++G+ + P+R +AAI++TL + + +G TYIE 
Sbjct: 193 IQENPYQLVKDVEGIGFGKADELGSRMGLSGNHPER^/KAAILYTLETTCLSEGHTYIETE 252 

Query: 241 TLLEKTITLLEEA RQIELDPS IVAKELTNLIAEDKVQHIGTKIFSNTLFFAE 292 

L+ T +LL ++ R E+D + I E +++ ED + + +LF+AE 

Sbjct: 253 QLIIDTQSLLKQSAREGQRITEMDAANAIIALGENKDIVIEDG RCYFPSLFYAE 306 

Query: 293 EGIKraLQRII.NQP-LDKQLNHKDIDREIRDIQKSLl^IHYDNIQEKAlREALLSKVFILT 351 

+ + K++I+Q + Q + + ++++ +++ Y Q++AI++AL S + +LT 
Sbjct: 307 QNVAKRVKHIASQTEYENQFPESEFLIjALGELEERMDVQYAPSQKEAIQKALSSPMLLLT 365 

Query: 352 GGPGTGKTTVINGIIEAYSELHHIDLN KND--IPIVLAAPTGRAARRMNELTGLPS 405 

GGPGTGKTTVI GI+E Y ELH + L+ K D PIVLAAPTGRAA+RM+E TGLP+ 
Sbjct: 367 GGPGTGKTTVIRGIVELYGELHGVSLDPSAYKKDEAFPIVIAAPTGRAAKRMSESTGLPA 426 

Query: 406 ATIHRHLGLNGDSDYQSLDDY-LDCSLIIIDEFSMVDTWLANQLFDALDSHTQVIIVGDS 464 

TIHR LG NG + +D ++ L+IIDE SM+D WLAN LF A+ H Q+IIVGD 
Sbjct: 427 VTIHRLLGWNGAEGFTHTEDQPIEGKLLIIDEASMLDIWLANHLFKAIPDHIQIIIVGDE 486 

Query: 465 DQLPSVGPGQVLADLMIim.PHVKLEKIFRQSEESTIVTLANQMRQGFLPEDFTAKKAD 524 

DQLPSVGPGQVL DLL +P V+L I+RQ+E S+IV LA+QM+ G LP + TA D 
Sbjct: 487 DQLPSVGPGQVLRDLLASQVIPTVRLTDIYRQAEGSSIVELAHQMiaiGLLPNNLTAPTKD 546 

Query: 525 RSYFEASANIIPNMISKIVQSALKSGIEAHEIQILAPMYRGQAGINNLKLIMQNLIiNPLK 584 

RS+ + I ++ K+V +ALK G A +IQ+LAPMYRG+AGIN LN+++Q++LNP K 

Sbjct: 547 RSFIRCGGSQIKEVVEKWANALKKGYTAKDlQVLAPMYRGKAGIlffiljNVMLQDIIiNPPK 605 

Query: 585 D-NNQFTFNDINFRIGDKVLHLVHDTELNVFNGDIGYITDLIPAKYTESKQDEIYMTFDG 643 

+ + F D+ +R GDK+L LVN E NVFNGDIG IT + AK K+D ++FDG 
Sbjct: 607 EKRRELKFGDVVYRTGDKILQliWQPENISrVFNGDIGEITSIFYAKENTEKEDMAVVSFDG 665 

Query: 644 QEVIYQRKEWLKITLRYAMS1HKSQGSEFQWILPITRQSGRMLQRNLIYTAITRSKSKL 703 

E+ + +K++ + T AY SIHKSQGSEF +V+LP+ + RML+RNL+YTAITR+K L 
Sbjct: 667 NEMTFTKKDWQFTHAYCCSIHKSQ^SEFPIVVLPVVKGYYRMLRRNLLYTAITRAKKFL 726 

Query: 704 ILLGEIGAFDFAVKNEGAK-RNTYLIERFENKQE 736 

IL GE A ++ VKN A R T L R + E 
Sbjct: 727 ILCGEEEALEWGVKNWDATVRQTSLKNRLSVQVE 760 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5035> which encodes the amino acid 
sequence <SEQ ID 5036>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 232-234 

The protein has homology with the following sequences in the databases: 

>GP:CAB14706 GB:Z99118 similar to conjugation transfer protein 
[Bacillus subtilis] 

Identities = 318/769 (41%) , Positives = 473/769 (61%) , Gaps = 29/769 (3%) 

Query: 7 GTVDRIIFENQANFFKILLLAIEDTDSDIDDFEIIITGTMADIIEGDDYTFWGELTQHPK 66 

GTV+ +1+ N N + +L + + +T I+D + +TG + E + YTF+G++ HPK 
Sbjct: 18 GTVNTVIYHNDTNLYTVLKVK^/TETSEAIEDKAVSVTGYFPALQEEETYTFYGKIVTHPK 77 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 544/816 (66%), Positives = 665/816 (80%), Gaps = 10/816 (1%) 

Query: 1 MEVFFTGTIERIIFENASNFFKILLLEIEDTDSDFDDVEVIITGTMADVIEGEEYTFWGT 60 

ME FTGT++RIIFEN +NFFKILLL IEDTDSD DD E+IITGTMAD+IEG++YTFWG 
Sbjct: 1 MEYVFTGTVDRIIFENQANFFKILLLAIEDTDSDIDDFEIIITGTMADIIEGDDYTFWGE 60 



Sbjct: 61 LTQHPKYGQQLKLSRYQKIKPSSSGV/NYFSSDHFKGIGKKTAEKIIALYGHNTIDHILE 120 

Query: 121 SPEQLSTISGLSKIlTOEAFIAKLiaJITYGTEXJVIAKIi&EYGLSNRAAIQIFDHYKEESLEV 180 

P +L TISGLSK NR+AF+AKLKLNYGTEQ++A L E GLSNR A+Q F+ YKEE+L++ 
Sbjct: 121 DPSKLETISGLSKANRQAFVAKLKLNYGTEQLIAGLVELGLSNRFALQAFEKYKEEALDL 180 

Query: 181 INENPYQLVEDIQGIGFKIADQLAEQVGIESDSPKRFRAAIIHTLVESSMEQGDTYIEAR 240 

+ ENPYQLVED+QG GFK+AD LAE +GIESDSPKRFRAA++H L+E S+ +GDTY++AR 
Sbjct: 181 VKENPYQLVEDLQGFGFKMADAIiAENLGIESDSPKRFRAALLHCLLEESINRGDTYVQAR 240 
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Query: 241 TLLEKTITLLEEARQIELDPSIVAKELTl-JLiaEDKVCHIGTKIFSHTLFFftEEGIKKNLQ 300 

LL+ ITLLE+ARQ+E DP+ VA++L+ LI E K+++ TK+F +L+FAEEGI N+ 
Sbjct: 241 QLLDFAITLLEDARQVECDPAAV.^SQLSELIIEGKIKNSDTKLFDASLYFAEEGIANNIS 300 

Query: 301 RILNQPLDKQLNHKDIDREIRDIQKSLNIHYDNIQEKAIREALLSKVFILTGGPGTGKTT 360 

R+L+ PL + +H I 1+ +QK I YD +Q++AI +AL SKVF+LTGGPGTGKTT 
Sbjct: 301 RLLDTPLSQSFSHDTIQTTIQAVQKDFAITYDQVQQEAITKALTSKVFLLTGGPGTGKTT 360 

Query: 361 VINGlIFAYSELHHIDLNKMDIPIVI^aM'TGRJUlRRMNELTGLPSATIHRHLGLNGDSDY 420 

VI GI++AY+ LH IDL+K D+PI+LAAPTGRAARRMNELTGLPSATIHRHLGLNGD+DY 
Sbjct: 361 VIRGILQAYANLHQIDLDKKDLPILLAAFTGRAARRMNELTGLPSATIHRHLGLNGDNDY 420 

Query: 421 QSLDDYLDCSLI I IDEFSMVDTWLANQLFDALDSHTQVI I VGDSDQLPSVGPGQVLADLL 480 

Q+++DYLDC L+I+DEFSMVDTWLANQL A++S TQVI IVGDSDQLPSVGPGQVL+DLL 
Sbjct: 421 QAMEDYLDCDLLIVDEFSMVDTWLANQLLGAINSTTQVIIVGDSDQLPSVGPGQVLSDLL 480 

Query: 481 NINALPHVKLEKIFRQSEESTIVTLANQ^QGFLPEDFTAKKADRSYFEASANIIPIIMIS 540 

+N+LP + L+KIFRQS+ESTIV LA+QMR+G L DF KKADRSYFEA A IP+MI 
Sbjct: 481 KVNSLPQIALQKiroQSQESTIVNLADQMRRGILAADFRDKKADRSYFEAQAAFIPDMIQ 540 

Query: 541 KIVQSALKSGIEAHEIQIIAPMYRGQAGINMjNLIMQISrLLNPLKDHNQFTFNDIMFRIGD 600 

KIV SA+KSGI A EIQILAPMY+GQAGIN+LN +MQ LLNPL+ +F FND +FR GD 
Sbjct: 541 KIVLSAIKSGIPAEEIQ1LAPMYKGQAGINHLNQLMQELLNPLQGQTEFLFNDTHFRKGD 600 

Query: 601 KVLHLVNDTELNVFNGDIGYITDLIPAKYTESKQDEIYMTFDGQEVIYQRKEWLKITLAY 660 

KVLHLWD +LNVFNGDIGYITDLIPAKYTESKQDE+ + FDG EV Y R EWLK+TLAY 
Sbjct: 601 KVLHLVNDAQLNVFNGDIGYITDLIPAKYTESKQDELILDFDGSEVTYPRNEV'JLKLTLAY 660 

Query: 661 AMSIHKSQGSEFQWILPITRQSGRMLQRNLIYTAITRSKSKLILLGEIGAFDFAVKNEG 720 

AMSIHKSQGSEFQWILPITRQSGR+LQRN+IYTAITRSKSKLILLGE AF++A+K+EG 
Sbjct: 661 AMSIHKSQGSEFQWILPITRQSGRLLQRNVIYTAITRSKSKLILLGEYTAFEYAIKHEG 720 

Query: 721 AKRNTYLIERFENKQEIANSQKIEDSSIDQKI DNTIINTSIPKTATPIEQ 770 

KR TYLIERF+ + ++A+SQ ++ ++ D++ ++S + P E 

Sbjct: 721 DKRQTYLIERFQEQSDIASSQPNQELKSKEQTSLFSNTATLEDDSQKSSSQSTNSNPTEN 780 

Query: 771 TNLSKITYRLTEENYLTIDPMIGINQQDISAIFDSK 806 

+ +RLT ENY TID MIG+ + DI+ F K 

Sbjct: 781 SQSDNDDFRLTPENYSTIDSMIGLTESDIALFFQKK 816 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1631 

A DNA sequence (GBSxl726) was identified in S.agalactiae <SEQ ID 5037> which encodes the amino 
acid sequence <SEQ ID 5038>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -8.23 Transmembrane 9 - 25 ( 7 - 29) 

Final Results 

bacterial membrane Certainty=0 .4291 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB69116 GB:U90721 signal peptidase I [Streptococcus pneumoniae] 
Identities = 120/201 (59%) , Positives = 144/201 (70%) , Gaps = 9/201 (4%) 

Query: 2 KEFIKEWGVFILILSLFLLSRIFLWQFTO/DGHStCIPTLADICEQLvVLKQTKINRFDIW 61 

K F+KEWG+F+LILSL LSRIF W V+V+GHSMDFTLAD E L V+K I+RFDIW 
Sbjct: 5 KNFLKEWGLFLLILSLLALSRI FFKSNVRVEGHSMDFTIADGEILFWKHLPIDRFDIVV 64 
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Query: 62 J^EEGGQKKKIVKRVIGMPGDVIKYKI^LTMKKTEEPYLKEYTKLFKKDKLQEKYS 121 

A+EE+G K IVKRVIGMPGD I+Y+ND L IN+K+T+EPYL +Y K FK DKLQ YS 
Sbjct: 65 AHEEDG- -NKDIVKRVIGMPGDTIRYENDKLYINDKETDEPYLADYIKRFKDDKLQSTYS 122 

F+ +AQ + AFT D N ++ F+ VP+G Y L+GDDR+VS DSR VG F 
Sbjct: 123 GKGFEGNKGTFFRSIAQKAQAFTVDVNYNTNFSFTVPEGEYLLLGDDRLVSSDSRHVGTF 182 

Query: 175 KKST I VGEVKFRFWP I RRFGT 195 

K I GE KFRFWPI R GT 
Sbjct: 183 KAKDITGEAKFRFWPITRIGT 203 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5039> which encodes the amino acid 
sequence <SEQ ID 5040>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 
INTEGRAL Likelihood = -2.50 Transmembrane 35 - 51 ( 35 - 51) 

20 Final Results 

bacterial membrane Certainty=0 . 1999 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

25 A related sequence was also identified in GAS <SEQ ID 9157> which encodes the amino acid sequence 

<SEQ ID 9158>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
>>> Seems to have a cleavable N-term signal seq. 

30 Final Results ----- 

bacterial outside Certainty= 0.300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/197 (66%) , Positives = 152/197 (76%) 

Query: 1 MKEFIKEWGVFILILSLFLLSRIFLWQFVICVDGHSMDPTIADKEQLVVLKQTKINRFDIV 60 
MK+FIKEWG F L L LF LSR+FLWQ VKVDGHSMDPTLA E+L+V Q +I+RFDIV 
40 Sbjct: 23 MKQFIKEWGPFTLFLILFGLSRLFLWQAVKVDGHSI>1DPTLAHGERLIVFNQARIDRFDIV 82 

Query. 61 v7ANEEEGGQKKKIVKRVIGMPGDVIKY^lDTLTItttIKKTEEPYLKEYTKLFKKDKLQEKY 120 

VA EEE GQKK+IVKRVIG+PGD I Y +DTL IN KKT EPYL EY K FK DKLQ+ Y 
Sbjct: 83 VAQEEENGQIO03IVTOVIGLPGDTISYNDDTLYINGKKTVEPYLAEYLKQFKNDKLQKTY 142 

45 

Query: 121 SYNPLFQDLAQSSTAFTTDSNGSSEFTTVVPKGHYYLVGDDRIVSKDSRAVGPFKKSTIV 180 

+YN LFQ LA++S AFTT+S G + F VPKG Y L+GDDRIVS+DSR VG FKK ++ 
Sbjct: 143 AYNTLFQQIAETSDAFTTNSEGQTRFEMSVPKGEYLLLGDDRIVSRDSREVGSFKKENLI 202 

50 Query: 181 GEVKFRFWPIRRFGTIN 197 

GEVK RFWP+ + N 
Sbjct: 203 GEVKARFWPLNKMTVFN 219 

SEQ ID 5038 (GBS268) was expressed mE.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
55 extract is shown in Figure 54 (lane 4; MW 50.3kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 9; MW 25.3kDa) and in Figure 
160 (lane 2-4; MW 25.3kDa). 



GBS268-His was purified as shown in Figure 222, lane 8. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1632 

A DNA sequence (GBSxl727) was identified in S.agalactiae <SEQ ID 5041> which encodes the amino 

5 acid sequence <SEQ ID 5042>. This protein is predicted to be ribonuclease HIII (rnhB). Analysis of this 

protein sequence reveals the following: 

Possible site: 37 

Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 4728 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) <: suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 101 19> which encodes amino acid sequence <SEQ ID 
10120> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45437 GB:U93576 ribonuclease HII [Streptococcus pneumoniae] 
Identities = 176/282 (62%), Positives = 219/282 (77%), Gaps = 13/282 (4%) 

20 





Query: 


16 


EKIRTDrAQHHISNmPYWFSAKISGA.TVX,LYTSGKLVFOGSNASHIAQKYGF- - IEQK 73 








E +T LA + NPY+ + K+ ATV +YTSGK++ QG A A +G+ +EQ 




Sbjct: 


18 


EHYQTSLAP SKNPYIRYFLKLPQATVSIYTSGKILLQGEGAEKYASFFGYQAVEQ- 72 


25 


Query: 


74 


ESCSSESQDIPIIGTDEVGNGSYFGGLAWASFVTPKDHAYLKICLGVGDSKTLTDQKIKQ 133 








+ Q++P+ IGTDEVGNGSYFGGLAWA+FVTP H +L+KLGVGDSKTLTDQKI+Q 




Sbjct: 


73 


TSGQNLPLIGTDEVGNGSYFGGLAWAAFVTPDQHDFLRKLGVGDSKTLTDQKIRQ 128 






134 


IAPLLEKAIPHKALLLSPQKYHQWSPNNKHNAVSVKVALHKQAIFLLLQDGFEPEKIVI 193 


30 






IAP+L++ I H+ALLLSP KYN+V+ +++NAVSVKVALHNQAI+LLLQ G +PEKIVI 




Sbjct: 


129 


IAPILKEKIQHQALLLSPSKYNEVIG- - DRYNAVSVKVALHNQAI YLLLQKGVQPEKI VI 186 




Query: 


194 


DAFTSSKNYQNYLKIffiKNQFKQTITLEEKAENKYUiVAVSSlIARm,FLENLNKLSDDVG 253 








DAFTS+KNY YL E N+F I+LEEKAE KYLAVAVSS+IAR+LFLENL L ++G 


35 


Sbjct: 


187 


DAFTSAKNYDKYLAQETNRFSNPISLEEKAEGI<YtAVAVSSVIARDLFLENLENLGRELG 246 



Query: 254 YKLPSGAGHQSDKVASQLLKAYGISSLEHCAKLHFANTKKAQ 295 

Y+LPSGAG SDKVASQ+L+AYG+ L CAKLHF NT+KA+ 
Sbjct: 247 YQLPSGAGTASDECVASQILQAYGiVJQGLNFCAKLHFKNT'EKAK 288 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5043> which encodes the amino acid 
sequence <SEQ ID 5044>. Analysis of this protein sequence reveals the following: 



Possible site: 35 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2148 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 194/298 (65%) , Positives = 240/298 (80%) , Gaps = 2/298 (0%) 



Query: 3 MNTIVMQADICKLQEKIRTDIAQHHISNIOTPYWFSAKISGATVLLYTSGICLVFQGSNASH 62 
55 MNT+V++ D L + ++ LA + IS+ N YV F+AK +G TVLLY SGKLV QG+ A+ 

Sbjct: 1 MIMLVLKIDAILSKHLKKQLAPYTISSQNTYVAFAAKKNGVTVLLYKSGKLvLQGNGANA 60 
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Query: 63 IAQKyGFIEQKE--SCSSESQDIPIIGTDEVGNGSYFGGLAWASFVTPKDHAYLKKLGV 120 

+AQ+ K S+ SQDIPIIG+DEVGNGSYFGG+AWASFV PKDH++LKKLGV 

Sbjct: 61 LAQEtiNLPVAKTVFEASNNSQDIPIIGSDEVGNGSYFGGIAWASFVDPKDHSFLKKLGV 120 

Query: 121 GDSKTLTDQKIKQIAPLLEKAIPHKAIJiSPQKYNQWSPHNKHNAVSVKVALHNQAIFL 180 

DSK L+D+ I+QIAPLLEK IPH++LLLSP+KYN++V + +NA+S+KVALHNQA1FL 
Sbjct: 121 DDSKKLSDKTIQQIAPLLEKQIPHQSLLLSPKKYNELVGKSICPYNAISIKVALHNQAIFL 180 

Query: 181 LLQDGFEPEKIVIDAFTSSKireQlWLK^KNQFKQTITLEEKAENKYIAVAVSSlIABML 240 

LLQ G +P++IVIDAFTS NY+ H-LK EKN F +T +EKAE+ YLAVAVSSIIARNL 
Sbjct: 181 LLQKGIQPKQIVIDAFTSQSNYEKHLKKEKNHFPNPLTFQEKAESHYLAVAVSSIIARNL 240 

Query: 241 FLENLNKLSDDVGYKLPSGAGHQSDXVASQLLKAYGISSLEHCAKLHFANTKKAQALL 298 

FL+NL++L D+GY+LPSGAG SDKVASQLL AYG+SSLE+ AKLHFANT KAQALL 
Sbjct: 241 FLDNLDQLGQDLGYQLPSGAGSASDKVASQLLAAYGMSSLEYSAKLHFANTHKAQALL 298 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1633 

A DNA sequence (GBSxl728) was identified in S.agalactiae <SEQ ID 5045> which encodes the amino 
acid sequence <SEQ ID 5046>. This protein is predicted to be heat shock protein 70. Analysis of this 
protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3874 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5047> which encodes the amino acid 
sequence <SEQ ID 5048>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3442 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 65/92 (70%) , Positives = 76/92 (81%) 

Query: 11 NRYKFVFGDKPLTLTTDKDNLFMEEIERVATEKYEAIKEKLPNADNETIAILMAINALSV 70 

NRYKF FG+K LTLTTDKDNLFMEE+ERVA EKY+A+K LP AD+ETIAILMAIN LS 
Sbjct: 5 NRYKFTFGEKTLTLTTDKBNLFMEEVERVAKEKYQALKNHLPEADDETIAILMAINTLST 64 

Query: 71 QLSREIDIEKMEDELNKLRSKTISDIKEKVSE 102 

QLSREI IEKME E+ LR KT+ ++EK ++ 
Sbjct: 65 QLSREIAIEKMEAEILDLRQKTLVGLQEKANQ 96 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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Example 1634 

A DNA sequence (GBSxl729) was identified in S.agalactiae <SEQ ID 5049> which encodes the a 
acid sequence <SEQ ID 505O. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood =-10.99 Transmembrane 124 - 140 ( 114 - 148) 

INTEGRAL Likelihood = -5.84 Transmembrane 22 - 38 ( 21 - 40) 

INTEGRAL Likelihood = -4.88 Transmembrane 2 - 18 ( 1-20) 

INTEGRAL Likelihood = -1.97 Transmembrane 84 - 100 ( 84 - 100) 



Final Results 

bacterial membrane Certainty=0 . 5394 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06827 GB:AP001517 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 59/182 (32%), Positives = 98/182 (53%), Gaps = 14/182 (7%) 

Query: 1 MLSLLLLI IVIWHFYIGYSRGIFLQVFYVLMSMVSLMIASQFYQELASQITLWVPYS - -N 58 

MLS++LL I++ F+IG RG+ LQ+ ++L + + +A ++Y +A+ I LW+PY + 
Sbjct: 1 MLSVlLLFILLCSFFIGKRRGLILQLvHLLGFVAAFFVAYKYYAPVATYIRLWIPYPQFS 60 

Query: 59 PVQGVEVYFFKDISKFQLSHVYYAGVAFVFIY SLSYLVGRLLGVLLHLAPVEHFDS 114 

P V + IF +VYY+G+AF ++ L ++VG +L L HL + 

Sbjct: 61 PDSPVTML IEAFNFENVYYSGIAFALLFIGTKILLHIVGSMLDFLTHLPILRSV— 114 

Query: 115 LQNNIISGFIAvLVCLLFMSMCLTIIATVPMSFVQEKLraSLFVRFLIMDLPFFSQFLVR 174 . . . .. ■ .. 

N + G L + L M + L + A +P+ VQ L SL +F++N PF S+F+ 
Sbjct: 115 --NGWLGGILGFVEWLIMFVLLYVGALLPIETVQTHLNQSLVAQFIMNHTPFLSEFIRN 172 

Query: 175 TW 176 
W 

Sbjct: 173 LW 174 

A related DNA sequence was identified in S.pyogenes <SEQ ID 505 1> which encodes the amino acid 

sequence <SEQ ID 5052>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.17 Transmembrane 124 - 140 ( 117 - 148) 
INTEGRAL Likelihood = -4.73 Transmembrane 84 - 100 ( 78 - 105) 
INTEGRAL Likelihood = -0.00 Transmembrane 156 - 172 ( 156 - 172) 



45 Final Results 

bacterial membrane Certainty=0. 4270 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the databases: 

>GP:BAB06827 GB:AP001517 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 57/177 (32%) , Positives = 98/177 (55%) , Gaps = 2/177 (1%) 

55 Query: 1 MLSLLIvLILTVmFYIGYSRGIILQSFYVIyGALLSLLVAWFYIGLAHKLTLWIPYSNPV 60 

MLS++++ IL +F+IG RG+ILQ ++LG + + VA ++Y +A + LWIPY 
Sbjct: 1 MLSVILLFILLCSFFIGKRRGLILQLVHLLGFVAAFFVAYKYYAPVATYIRLWIPYPQFS 60 

Query: 61 EGTSVFFFKSVDIFVLDKVYYAGLAFFIIFLLGYALSRFLGIFVHFLLLNYFDNQWTKCL 120 
60 + V ++ F + VYY+G+AF ++F+ L +G + FL L 

Sbjct: 61 PDSPVTML- - IFAFNFENVYYSGIAFALLFIGTKILLHIVGSMLDFLTHLPILRS VNGWL 118 
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' Query; 121 SGGLAFLVSLLFLNMLLS I FATVPMPFLQHYLHSS FLARLVIEHLPPLTT I IQKLWI 177 
G h F+ L + +LL + A +P+ +Q +L+ S +A+ ++ H P L+ 1+ LWI 
Sbjct: 119 GGILGFVEVYLIMEVLLWGALLPlETOQTHIiNQSLVAQFIMNHTPFLSEFIRNLWI 175 

Ah alignment of the GAS and GBS proteins is shown below. 

Identities = 87/176 (49%) , Positives = 123/176 (69%) 

Query: 1 MLSLLLLIIVIWHFYIGYSRGIFLQVFYVIMSMVSLMIASQFYQELASQITLWVPYSNPV 60 
10 MLSLL+++I+ W+FYIGYSRGI LQ FYVL +++SL++A++FY LA + +TLW+PYSNPV 

Sbjct: 1 MLSLLIVLILTWNFYIGYSRGIILQSFYVLGALLSLLVANRFYIGIiAHKLTLWIPYSNPV 60 

Query: 61 QGVEVYFFKDISKFQLSHVYYAGVAFVFIYSLSYLVGRLLGVLLHLAPVEHFDSLQHNII 120 
+G V+FFK + F L VYYAG+AF 1+ L Y + R LG+ +H 4 +FD+ 4 
15 Sbjct: 61 EGTSVFFFKSTOIFVLDKVYYAGIAFFIIFLLGYALSRFLGIFVHFLLLNYFDNQWTKCL 120 

Query: 121 SGFIAVLVCLLFMSMCLTILATVPMSFVQEKLWNSLFVRFLINDLPFFSQFLVRTW 176 

SG LA LV LLF++M L+l ATVPM F+Q L +S R +1 LP + + + W 
Sbjct: 121 SGGLAFLVSLLFLNMLLSIFATVPMPFLQHYLHSSFLARLVIEHLPPLTI I IQKLW 176 



20 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1635 

A DNA sequence (GBSxl730) was identified in S.agalactiae <SEQ ID 5053> which encodes the amino 
acid sequence <SEQ ID 5054>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

30 bacterial cytoplasm Certainty=0. 4176 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 17> which encodes amino acid sequence <SEQ ID 
35 1 0 1 1 8> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14818 GB: 299118 similar to DNA mismatch repair protein 
[Bacillus subtilis] 
Identities = 320/790 (40%) , Positives = 466/790 (58%) , Gaps = 18/790 (2%) 

40 

MNNKILEQLEFNKVKELILPYLICTEQSQEELSELEPMTEAPKIEKSFNEISDMEQIFVEH 69 
M K+L LEF+KVKE 4+ + + +E L EL+P +I+K +E+ + I 

MQQKVLSALEFHKVKEQVIGHAASSLGKEMLLELKPSASIDEIKKQLDEVDEASDIIRLR 60 

HSFGIVSLSSISESLKRLELSADLNIQELLAIKKVLQSSSDMIHFYSDL--DNVSFQSLD 127 

L I +L+R E+ + L+ E I +L + M HF 4 + D V + 
GQAPFGGLVDIRGALRRAEIGSVLSPSEFTEISGLLYAVKQMKHFITQMAEDGVDIPLIH 120 

RLFENLEQFPNLQGSFQA- INDGGFLEHFASPELERIRRQLTNSERRVRQILQDMLKEKA 186 
+ EL +L+ 4 I+D G + AS L IR QL E RVR L4 ML4 4 





10 


Sbjct: 


1 


Query: 




Sbjct: 


61 




128 


Sbj ct: 


121 




187 


Sbjct: 


181 




245 


Sbjct: 


241 
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Query: 305 lljMiINVRHPLL--SNPVAlfflLHFIX^I'TAIVITGEOTGGKTIMLKTLGIjAQLMGQSGLP 362 
+ L RHPLL VAND+ +D + IVITGPNTGGKT+ LKTLGL LM QSGL 

^ Sbjct: 301 FIRLKKARHPLLPPDQVVRHDIEI^RDFSTIVITGPNTGGKTVTLKTLGLLTLMAQSGLH 360 

Query: 3S3 VLADKGSKIAVFOT^IFADIGDEQSIEQSLSTFSSHMTHIVSILNEADHNSLVLFDELGAG 422 

+ AD+GS+ AVF ++FADIGDEQSIEQSLSTFSSHM +IV IL + + NSLVLFDELGAG 

Sbjct: 3S1 IPftDEGSEAAVFEHVFADIGDEQSIEQSLSTFSSHMVNIVGILEQVNENSLVLFDELGAG 420 

10 Query: 423 TDPQEGASLAMAILEHLRLSNIKTMATTHYPELKAYGIETNFVENASMEFDAETLSPTYR 482 
TDPQEGA+LAM+IL+ + +N + +ATTHYPELKAYG V NAS+EFD ETLSPTY+ 
Sbjct: 421 TDPQEGAALAMSILDDVHRTNARVIATTHYPELKAYGYNREGVMHASVEFDIETLSPTYK 480 



Query: 483 FMQGVPGRSNAFEIASRIiGIAPFI VKQAK- QMTDSDSDVNRI IEQLEAQTLETRRRLDHI 541 

+ GVPGRSNAFEI+ RLGL 1+ QAK +MT ++V+ +1 LE L 
Sbjct: 481 LLIGVPGRSNAFEISKRLGLPDHIIGQAKSEMTASHNEVDTMIASLEQSKKRAEEELSET 540 

Query: 542 KEVEQENLKBTSIRAVKKLYNEFSHERDKELEKIYQEAQEIVDMALNESDTILKKL ND 597 

+ + +E K ++ +++ E + ++DK LE+ Q+A E V A+ E++ 1+ +L + 
Sbjct: 541 ESIRKEAEKLHKELQQQIIELNSKKDKMLEEAEQQAAEKVKAAMKEAEDIIHELRTIKEE 600 

Query: 598 KSQLKPHEIIDAKAQIKKIAPQTOLSKNKVLNKAKKIKAARAPRIGDDIIVTSYGQRGTL 657 

K HE+I+AK +++ P + SK K +K R + GD++ V ++GQ+GTL 
Sbjct: 601 HKSFKDHELINAKKRLEGAMPAFEKSKKPEKPKTQK RDFKPGDEVKVLTFGQKGTL 656 

Query: 658 TSQLKDGRWEAQVGI I KMTLTQDEFTLVRVQEEQKVKSKQINWKKADSSGPRARLDLRG 717 

+ W Q+GI+KM + + + ++ EKKKIVKD LDLRG 

Sbjct: 657 LEKTGGNEtWIVQIGILKMKVKEKDLEFIKSAPEPK-KEKMITAVKGKDYH-VSLELDLRG 714 

Query: 718 KRYEEAMQELDNFIDQALLNb#lGQVDIIHGIGTGVIREGOTKYLRRNKHVKHFAYAPQNA 777 

+RYE A+ ++ ++D A+L +V IIHG GTG +R+GV L+ ++ VK + 
Sbjct: 715 ERYENALSRVEKYLDDAVLAGYPRVSIIHGKGTGALRKGVQDLLKNHRSVKSSRFGEAGE 774 



Query: 778 GGSGATIVTL 787 
35 GGSG T+V L 

Sbjct: 775 GGSGVTWEL 784 



A related DNA sequence was identified in S. pyogenes <SEQ ID 5055> which encodes the amino acid 
sequence <SEQ ID 5056>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 3843 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Wot Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 775/787 (98%) , Positives = 781/787 (98%) 

50 



Query: 


2 


IMLGIMKSMNNKILEQLEFNKVKELILPYLKTEQSQEELSELEPMTEAPKIEKSFWEISD 61 






I LGIMKSMNNKILEQLEFNKVKEL+LPYLKTEQSQEEL ELEPMTEAPKIEKSFNEISD 




Sbj ct: 


32 


II1GIMKSMNNKILEQLEFNKOTKELLLPYLKTEQSQEELLELEPMTEAPKIEKSFNEISD 


91 


Query: 


62 


MEQIFVEHHSFGIVSLSSISESLKRLELSADIMIQELLAIKKVLQSSSDMIHFYSDLDNV 


121 






MEQIFVEHHSFGIVSLSSISESLKRLELS DLNIQELLAIKKVLQSSSDMIHFYSDLDNV 




Sbj ct : 


92 


MEQIFVEHHSFGIVSLSSISESLKRLELSTDLNIQELLAI1CKVLQSSSDMIHFYSDLDNV 


151 




122 


SFQSLDRLFENLEQFPNLQGSFQAINDGGFLEHFASPELERIRRQLTNSERRVRQILQDM 


181 






SFQSLDRLFENLEQFPNLQGSFQAINDGGFLEHFASPELERIRRQLTNSERRVRQILQDM 




Sbj ct : 


152 


SFQSLDRLFENLEQFPNLQGSFQAIIIDGGFLEHFASPELERIRRQIiTNSERRVRQILQDM 


211 


Query: 


182 


LKEKAELLSEl^IASRSGRSVLPVKKTYRl^ISGVAmDISSSGSTVYIEPPAVVTIJ^EI 


241 






LKEKAELLSEI^IASRSGRSvLPVKNTYRNRISGV^DISSSGSTVYIEPRAVVTLNEEI 




Sbjct: 


212 


LKEKAELIiSENLIASRSGRSVI.PVKin-YRNRISGVVHDISSSGSTVYIEPRAVVTLNEEl 


271 
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Query: 242 TQLRADERHEESRILHAFSDLLRPHVATIRN^JAWILGHLDFVRAKYLFMSDNKATIPEIS 301 

TQLRADERHEE RI LHAFSDLLRPHVAT I RNNAW I LGHLDFVRAKYLFMSDNKATI P+ 1 S 
Sbjct: 272 TQLRADERHEEGRILHAFSDLLRPBT/ATIRNNAWILGHLDFVRAKYLFMSDNKATIPKIS 331 

Query: 302 ITOSTIiALINVRHPLLSNPVANDLHFDQDLTAIVITGPjSITGGKTIMLKTLGIAQLMGQSGL 361 

NDSTLALINVRHPLLSNPVANDLHFD DLTAIVITGPNTGGKTIMLKTLGLAQLMGQSGL 
Sbjct: 332 NDSTIALINVRHPLLSNPVANDLHFDHDLTAIVITGPNTGGKTIMLKTLGLAQLMGQSGL 391 

Query: 362 PVLADKGSKIAVFNNIFADIGDEQSIEQSLSTFSSHMTHIVSILNEADHNSLVLFDELGA 421 

'PVl^KGSKIAVFNNIFADIGDEQSIEQSLSTFSSHMTHIVSILNEADHNSLVLFDELGA 
Sbjct: 392 PVLADKGSKIAVFNWIFADIGDEQSIEQSLSTFSSHMTHIVSILNEADHNSLVLFDELGA 451 

Query: 422 GTDPQEGASLAMAILEHLRLSNIKTMATTHYPELKAYGIETNFVENASMEFDAETLSPTY 481 

GTDPQEGASLAMAILEHLRLS+IKTMATTHYPELKAYGIETNFVENASMEFDAETLSPTY 
Sbjct: 452 GTDPQEGASLAMAILEHLRLSHIKTMATTHYPELKAYGIETNFVENASMEFDAETLSPTY 511 

Query: 482 RFMQGVPGRSNAFEIASRLGLAPFIVKQAKQMTDSDSDVMRIIEQLEAQTLETRRRLDHI 541 

RFMQGVPGRSNAFEIASRLGIjAPFIVKQAKQMTDSDSDVNRIIEQLEAQTLETRRRLDHI 
Sbjct: 512 RFMQGVPGRSNAFEIASRLGLAPFIVKQAKQMTDSDSDVNRIIEQLEAQTLETRRRLDHI 571 

Query: 542 KEVEQENLKFNRAVKKLYI^FSHERDKELEKIYQFAQEIVDMAIiNESDTILKKLiro 601 
I 

Sbjct: 572 1 

Query: 602 KPHEIIDAKAQIKKIAPQVDLSKNKVLNKAKKIKAARAPRIGDDIIVTSYGQRGTLTSQL 661 

KPHEIIDAKAQIKKIAPQVDLSKNKVLNKAKKIKAARAPRIGDDIIVTSYGQRGTLTSQL 
Sbjct: 632 KPHEIIDAKAQIKKLAPQVDLSKNKVLNKAKKIKAARAPRIGDDIIVTSYGQRGTLTSQL 691 

Query: 662 KDGRWEAQVGIIKMTLTQDEFTLVRVQEEQKVKSKQINV\fKKADSSGPRARLDLRGKRYE 721 

KDGRWEAQVGI IKMTLTQDEF+LVRVQEEQKVK+KQINWKKAD SGPRARLDLRGKRYE 
Sbjct: 692 KDGRWEAQVGI IKMTLTQDEFSLVRVQEEQKVKNKQINWKKADGSGPRARLDLRGKR YE 751 

Query: 722 EAMQELDNFIDQALLN1W1GQVDIIHGIGTGVIREGVTKYLRRNKHVKHFAYAPQNAGGSG 781 

EAMQELD+ F I DQALLNNMGQVDI IHGIGTGVIREGOTO'LRRNKHVKHFAYAPQNAGGSG 
Sbjct: 752 FAMQELDHFIDQALLNNMGQVDIIHGIGTGVIREGVTKYLRRNKHVKHFAYAPQNAGGSG 811 

Query: 782 ATIVTLG 788 

ATIVTLG 
Sbjct: 812 ATIVTLG 818 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1636 

A DNA sequence (GBSxl731) was identified in S.agalactiae <SEQ ID 5057> which encodes the amino 
acid sequence <SEQ ID 5058>. This protein is predicted to be thioredoxin (trxA). Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2721 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ED 101 15> which encodes amino acid sequence <SEQ ID 
101 16> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB40815 GB:AJ133006 thioredoxin [Listeria monocytogenes] (ver 
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2) 

Identities = 64/100 (64%), Positives = 78/100 (78%), Gaps = 1/100 (1%) 

Query: 15 MALEVTDATFVEETI<EGLVL1DFI^ATWCGPCRMQRPILEQLSQEIDEDELKILKMDVDEN 74 
5 M E+TDATF +ET EGLVL DFWATWCGPCRM AP+LE++ +E E LKI+KMDVDEN 

Sbjct: 1 MVKEITDATFEQETSEGLvLTDFWATWCGPCRMVAPVLEEIQEERGE-ALKIVKMDVDEN 59 

Query: 75 PETARQFGIMS I PTLMFKKDGE WKQVAGVHTKDQLKAI I 114 
PET FG+MSIPTL+ KKDGEW+ + G K++L +1 
10 Sbjct: 60 PETPGSFGVMSIPTLLIKKDGEWETIIGYRPKEELDEVI 99 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5059> which encodes the amino acid 
sequence <SEQ ID 506O. Analysis of this protein sequence reveals the following: 

Possible site: 48 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2721 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1637 

25 A DNA sequence (GBSxl732) was identified in S.agalactiae <SEQ ID 5061> which encodes the amino 
acid sequence <SEQ ID 5062>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

>» Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -7.54 Transmembrane 
30 INTEGRAL Likelihood = -5.52 Transmembrane 

INTEGRAL Likelihood = -4.62 Transmembrane 



170 - 186 ( 167 - 191) 
87 - 103 ( 86 - 107) 
105 - 121 ( 104 - 126) 



Final Results 

bacterial membrane --- Certainty=0 .4015 (Affirmative) < succ 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MEIGQQIIRYRKQQALSQEELAEKVYVSRQSISNWENDKTYPDIHSLLLLSQIFQVSLDQ 60 
M++ +++ RK++ LSQE+LAEK+ +SRQ++S WE+ ++ PD++ L++LS+++ V++D 

Query: 61 LIKGDIE 67 

L+K E 
Sbjct: 61 LVKETYE 67 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1739> which encodes the amino acid 
sequence <SEQ ID 1740>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 173 - 189 ( 169 - 194) 
INTEGRAL Likelihood = -5.52 Transmembrane 90 - 105 ( 89 - 110) 
INTEGRAL Likelihood = -4.62 Transmembrane 108 - 124 ( 107 - 129) 



Final Results 
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bacterial membrane Certainty=0 .4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 187/195 (95%) , Positives = 191/195 (97%) 

Query: 1 MEIGQQIIRYRKQQALSQEELAEKVYVSRQSISNWENDKTYPDIHSLLLLSQIFQVSLDQ 60 
MEIGQQIIRYRKQQALSQE+LAEKVYVSRQSISNPJENDKTYPDIHSLLLLSQIFQVSLDQ 
10 Sbjct: 4 MEIGQQIIRYRKQQALSQEK1AEKVYVSRQSISNWENDKTYPDIHSLLLLSQIFQVSLDQ 63 



Query: 61 LIKGDIEKMKYTITQVDKKNFERDTKVMVTLMILLMISSYPLVYFLEWLGLGIFVLLSII 120 

LIKGDIEKMOTITQVDKKNF+RDTCTMWLMILLMISSYPLVYFLEWLGLGIFVLLSII 
Sbjct: 64 LIKGDIEKMOTITQVDKKNFKRDTKVMVTLMILLMISSYPLVYFLEWLGLGIFVLLSII 123 

Query: 121 TMTYi^VERFKKKYDVQTYKEIIAVSSGKLLDEIEKREERAKLPYQKPLIVTVFFLITV 180 

TMTYANRVERFKKKYDVQ YKEILAVS +GKLLDE IEKREERA LPYQKPLIVTVFFLITV 
Sbjct: 124 TMTYANRVERFiaaCYDVQPYKEILAVSNGKLLDEIEKREERATLPYQKPLIVTVFFLITV 183 

Query: 181 ATFFASRFIFTWLFH 195 

A FASRF+FTWLFH 
Sbjct: 184 AFAFASRFMFTWLFH 198 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1638 

A DNA sequence (GBSxl733) was identified in S.agalactiae <SEQ ID 5063> which encodes the amino 

acid sequence <SEQ ID 5064>. This protein is predicted to be adenine glycosylase (mutY). Analysis of this 

protein sequence reveals the following: 

30 Possible site: 3 0 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2385 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9425> which encodes amino acid sequence <SEQ ID 9426> 
was also identified. 



40 The protein has homology with the following sequences in the GENPEPT database. 

?GP:BAB04650 GB:AP001510 adenine glycosylase [Bacillus halodurans] 
Identities = 130/331 (39%), Positives - 190/331 (57%), Gaps = 15/331 (4%) 



Query: 1 t^QQTQVNWIPYYKRFLEWFPQIKDLADAPEEQLLKAI^GLGYYSRvRNMQKAAQQVMV 60 

MLQQT+V+TVI PYY+ F+ FP ++ LA A E+Q+LKAWEGLGYYSR RN+Q A ++V+ 
Sbjct: 45 MLQQTRVDWIPYYQAFMRQFPTLETIAYAEEDQVLKAIVEGLGYYSRARNLQSAVREVVE 104 

Query: 61 DFGGIFPHTYDDIASLKGIGPYTA3AIASISFNLPEPAVDGNVMR\'MARLFEVNYDIGDP 120 

+GG P T +1+ LKG+GPYTAGAI SI+++ PEPAVDGNVMRV++R+ + DI 
Sbjct: 105 SYGGEVPSTRKEISKLKGVGPYTAGAII.SIAYDQPEPAVDGNVMRVLSRVLYIEEDIAKV 164 

Query: 121 KNRKIFQAIMEILIDPDRPGDFNQALMDLGTDIESAKTPRPDESPIRFFNAAYLNGTYSK 180 

K R +F++++ LI + P FNQ LM+LG + + +P P+R A4 G + 

Sbjct: 165 KTRTLFESLLYDLISKENPSFFNQGLMELGALVCTPTSPGCLLCPVRDHCRAFAAGVQEQ 224 

Query: 181 YPIKNTKKKPKPMRIQAFVIRNQNGQYLLEKNTKGRLLGGFWSFPIIETSPLSQQLDLFD 240 

PIK KKKPK ++ A VIRN+ GQ L+E+ + LL W FP +E L 
Sbjct: 225 LPIKAKKKKPKAKQLIAAVIRNEKGQVLIERRPEKGLLAKLWQFPNVE LES 275 
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Query: 241 DNQSNPIIWQTQNETFQREYQLKPQWTD1OTFPNIKHTFSHQKWTIELIEGWKAT-DLPN 299 

+ ++ +E P + + + ++H FSH W I + E VK L + 

Sbjct: 276 TKNAQQVLGDYIHERFHLDAAV GEYVQTVEHVFSHLIWNIRVYEATVKGVPSLND 330 

Query: 300 APHLKWVAIEDFSLYPFATPQKKMLETYLKQ 330 

WV Y F +K+++ L++ 

Sbjct: 331 KYEADWVDDRTIENYAFPVSHQKIIQGNLRK 361 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5065> which encodes the amino acid 
sequence <SEQ ID 5066>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3579 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 330/333 (99%) , Positives = 331/333 (99%) 

Query: 1 MLQQTQVNTVIPYYKRFLEWFPQIKDLADAPEEQLLKAVEGLGYYSRVRNMQIOAQQVMV 60 

MLQQTQVNTOIPYYKRFLEWFPQIKDLADAPEF.QLLKAWEGLGYYSRVRNMQKAAQQVMV 
Sbjct: 52 MLQQTQVNTVIPYYKRFLEWFPQIKDLADAPEEQLLKAWEGLGYYSRVRNMQKAAQQVMV 111 

Query: 61 DFGGIFPHTYDDIASLKGIGPYTAGAIASISFNLPEPATOGNVMRVMARLFEVNYDIGDP 120 

DFGGIFPHTYDDIASLKGIGPYTAGAIASISFNLPEPAVDGNVMRVMARLFEVNYDIGDP 
Sbjct: 112 DFGGIFPHTYDDIASLKGIGPYTAGAIASISF^PEPAVDGNVMRVMARLFEVNYDIGDP 171 

Query: 121 KNRKIFQAI^ILIDPDRPGDFNQALMDLGTDIESAKTPRPDESPIRFFNAAYLNGTYSK 180 

KNRKIFQAIMEIMDPDRPGDFNQALMDLGTDIESAKTPRPDESPIRFFNAAYLNGTY K 
Sbjct: 172 KNRKIFQAIMEILIDPDRPGDFNQALMDLGTDIESAKTPRPDESPIRFFNAAYLNGTYGK 231 

Query: 181 YPI KNTKKKPKPMRI QAFVI RNQNGQYLLEKNTKGRLLGGFWS FPI IETS PLSQQLDLFD 240 

YPIKN KKKPKPMRIQAFVIRNQNGQYLLEKNTKGRLLGGFWSFPIIETSPLSQQLDLFD 
Sbjct: 232 YPIKNPKKKPKPMRIQAFVIRNQNGQYLLEKNTKGRLLGGFlfSFPIIETSPLSQQLDLFD 291 

Query: 241 DNQSNPIIWQTQNETFQREYQLKPQWTDNHFPNIKHTFSHQKWTIELIEGWKATDLPNA 300 

DNQSNPIIWQTQNETF+REYQLKPQWTDNHFPNIKHTFSHQKWTIELIEGVVKATDLPNA 
Sbjct: 292 DNQSNPIIWQTQNETFEREYQLKPQWTDmiFPNIKHTFSHQKWTIELIEGVVKATDLPNA 351 

Query: 301 PHLKWVAIEDFSLYPFATPQKKMLETYLKQKNA 333 

PHLIOWAIEDFSLYPFATPQKKMLETYLKQKNA 
Sbjct: 352 PHLKWVAIEDFSLYPFATPQKKMLETYIiKQKNA 384 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1639 

A DNA sequence (GBSxl734) was identified in S.agalactiae <SEQ ID 5067> which encodes the amino 
acid sequence <SEQ ID 5068>. This protein is predicted to be maltose/maltodextrin transport system 
(malG). Analysis of this protein sequence reveals the following: 

Possible site: 52 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.30 Transmembrane 14 - 30 ( 5 - 35) 

INTEGRAL Likelihood = -6.95 Transmembrane 248 - 264 ( 242 - 267) 

INTEGRAL Likelihood = -5.15 Transmembrane 75 - 91 ( 74 - 94) 

INTEGRAL Likelihood = -3.19 Transmembrane 110 - 126 ( 110 - 127) 

INTEGRAL Likelihood = -2.13 Transmembrane 141 - 157 { 138 - 157) 
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Likelihood = -0.32 



' Transmembrane 



! - 204 ( 188 - 204) 



- Final Results 

bacterial membrane Certainty=0 . 5118 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < ; 



The protein has homology with the following sequences in the GENPEPT database. 

i transport system 

>%) , Gaps = 5/281 (1%) 

Query: 1 

Sbjct: 1 
Query: 59 



i GB:AP001517 maltose/maltodextri: 
(permease) [Bacillus halodurans] 
Identities = 117/281 (41%) , Positives = 169/281 (5£ 



MNKK- -KRiaLTFVYILLIVLSIlMLFPIVW^TSFRGEGSAFVNyFIPKTWTLDNyAK 58 
MNKK RL +T +Y+ L+V+ 1+ L+P++W V S S F + IP+T + +Y 

MNKKVKSRLEVTAIYLFLLVMGIVILYPLLWTVGLSLNPGTSLFSSRMIPETISFRHyEW 6 0 



Query: 119 GFMSMIAvYYILKALNLDQTLTALIFVY-SAGAALTFYIAKGFFDTIPYSLDESAMIDGA 177 

M+M+A+Y +L +NL TL LI +Y + ++ KG+FDTIP LDESA +DGA 

Sbjct: 121 VLMMWALYILIjNTVNLLDTLLGLILIYVGTSIPMJAFLVKGYFDTIPRELDESAKLDGA 180 

Query: 178 TRLDIFLKITLPLSKPIIVYTALIAFMGPWMDFIFAKVILGDATSKYTVAIGLFSMLQQD 237 

IF I LPL+KPI+ AL FM P+MDFI ++IL + YT+A+GLF+ + 
Sbjct: 181 GHFRIFFTIMLPLAKPIIAWALFNFMSPFmFILPRIIL-RSPKOTTIALGLFNFVNDQ 239 

Query: 238 TINQWFMSFTAGSVIIAIPITILFMFMQKYYVEGITGGSVK 278 

N F F AG+++IAIPI +F+F+Q+Y + G+T G+ K 
Sbjct: 240 FANN-FTRFAA3AILIAIPIATVFLFLQRYLISGLTTGATK 279 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5069> which encodes the amino acid 
sequence <SEQ ID 507O. Analysis of this protein sequence reveals the following: 



Possible sit< 
>» Seems to have 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



cleavable 1 
Likelihood = -6 
Likelihood = -6 
Likelihood = -3 
Likelihood = -1 
Likelihood = -1 



:m signal seq. 

Transmembrane 76 - 92 

Transmembrane 248 - 264 

Transmembrane 110 - 126 

Transmembrane 129 - 145 

Transmembrane 188 - 204 



129 - 145 

138 - 204; 



• Final Results - 

bacterial t 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0. 3569 (Affirmative) ■ 
■- Certainty=0 . 0000 (Not Clear) < i 
•- Certainty=0. 0000 (Not Clear) < s 



The protein has homology with the following sequences in the databases: 

>GP:CAA6000S GB:X86014 cymG [Klebsiella oxytoca) 
Identities = 119/270 (44%) , Positives = 172/270 (63%) , Gaps = 7/270 (2%) 

Query: 11 LVYATLI ILS I IWLFPIAWVILTSFRSEGTAYVNYFI PKTFTLNHYINLFTNETFPFGKW 70 

LVY L++ +++ L P+ W +++S + 4 + F +FTL HY NL T P+ KW 

Sbjct: 12 LVYLFLLLNALVVLGPVIWTW1SSLKPGNNLPSSGFTEISFTLEHYHNLLTGT--PYLKW 69 



FMNTLIVATFTCIISTFITVAIAYSLSRIKFKFRNGFLKLALILNMFPGFMSMIAIYYIL 13 0 
+ NT I+AT +IS + A+ SR +FK + L L+L MFP F+SM AIY +L 
YKNTFIIATCNMLISLVVVTITAFIFSRYRFKAKKKILMSILVLQMFPAFLSMTAIYILL 129 



Query: 
Sbjct: 

Query: 131 KALGLTQTLTALVLVYSSGAALGF- -YIAKGFFDTIPYSLDESftMIDGATRMDIFFKITL 188 

+ L T L+LVY +G+ L F ++ KG+FD IP SLDE+A IDGA + IFF+I L 
Sbjct: 130 SKMNLIDTYIGLLLVYVTGS - LPFMTWL VKGYFDAI PT3LDEAAKIDGAGHLTIFFEI IL 188 

Query: 189 PLAKPI IVYTALLAFMGPWIDFI FAQVIljGnATSKYTVAIGLFSMLQPDTINNWFMAFTA 248 
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PLAKPI+V+ AL++F GPW+DFI +IL + K T+AIG+FS + ++ N F FA 
• Sbjct: 189 PLAKPILVFVALVSFTGPWMDFILPTLIL-RSEDKMTIAIGIFSWISSNSAEN-FTLFAA 246 

Query: 249 GSVLIAVPITLLFMFMQKYYVEGITGGSVK 278 

G++L+AVPITLLF+ QK+ G+ G+VK 
Sbjct: 247 GALLVAVPITLLFIVTQKHITTGLVSGAVK 276 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/278 (81%) , Positives = 253/278 (90%) 

Query. 1 MNKKKRIJ3LTFWILLIVLSIMWLFPIVWVVLTSFRGEGSAFV1^FIPKTWTLD]>IYAKLF. 60 

M K+R L VY LI+LSI+WLFPI WV+LTSFR EG+A+ VNYF I PKT+TL++ Y LF 
Sbjct: 1 MKNKFJIFQLGLVYATLIILSIIWLFPIAWVILTSFRSEGTAYWYFIPKTFTLNHYINLF 60 

Query: 61 TQOTFPFGQWFIOTLFVATCTCILSTLITVA^IAYSiSRIKFKHRNGFLKLALVLNMFPGF 120 

T TFPFG+WF+NTL VAT TCI+ST ITVA+AYSLSRIKFK RNGFLKLAL+LNMFPGF 
Sbjct: 61 TNETFPFGKWFMNTLIVATFTCIISTFITVAIAYSLSRIKFKFRNGFLKLALILNMFPGF 120 

Query: 121 MSMI AVYYILKALNLDQTLTALI FVYSAGAALT FYIAKGFFDT I PYSLDESAMI DGATRL 180 

MSMIA+YYILKAL L QTLTAL+ VYS+GAAL FYIAKGFFDTI PYSLDESAMIDGATR+ 
Sbjct: 121 MSMIAIYYILKALGLTQTLTALVLVYSSGAALGFYIAKGFFDTIPYSLDESAMIDGATRM 180 

Query: 181 DIFLKITLPLSKPIIVYTALIAFMGPWMDFIFAKVILGDATSKYTVAIGLFSMLQQDTIN 240 

DIF KITLPL+KPI I VYTAL+AFMGPW+DFI FA+VILGDATSKYTVAIGLFSMLQ DTIN 
Sbjct: 181 DIFFKITLPLRKPIIVYTALLAFMGPWIDFIFAQVILGDATSKYTVAIGLFSMLQPDTIN 240 

Query: 241 QWFMS FTAGSVI IAI P I TI LFMFMQKYYVEGITGGSVK 278 

WFM+FTAGSV+IA+PIT+LFMFMQKYYVEGITGGSVK 
Sbjct: 241 NWFMAFTAGSVLIAVPITLLFMFMQKYYVEG1TGGSVK 278 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1640 

A DNA sequence (GBSxl735) was identified in S.agalactiae <SEQ ID 5071> which encodes the amino 
acid sequence <SEQ ID 5072>. This protein is predicted to be cymF protein (malF). Analysis of this protein 
sequence reveals the following: 

j N-terminal signal sequence 



Likelihood =-11 
Likelihood 
Likelihood 
Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 
INTEGRAL Likelihood 



Transmembrane 427 - 443 ( 417 - 

24 Transmembrane 99-115 ( 96 - 

39 Transmembrane 166 - 182 ( 154 - 

21 Transmembrane 259 - 275 ( 257 - 2761 

21 Transmembrane 229 - 245 ( 223 - 

10 Transmembrane 44 - 60 ( 40 - 66! 

51 Transmembrane 314 - 330 ( 312 - 



Final Results 

bacterial membrane Certainty=0. 5585 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 27 SFLIMGIUAl^KNKQIVKGLLFLISEILFLITFVYQVIEAWGLISLGTQEQGMTTiCrVDG 86 

SFLIMG L + +KG +FL+ +1+ +1+ + ++ A +GLI+LGT Q T G 

Sbjct: 15 SFLIMGATQLISGHWIKGSVFLLFQIV-VISNINLLLNATQGLITLGTVAQ TRSG 68 

Query: 87 IKIQVATQGDNSMLMLIFGLASLIFCCVFAYIYWSNIKSAAHLLTLKEEGREIPSFKKDI 146 
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Query: 147 KSLTDGRPHMTLMSIPLIGVLLFTILPLVYMICLAFTNYDH-HHLPPKSLFDWVGFANFG 205 

+++ D RF +++ I + F I+P++ + ++ TNY +H+PPK+L DWG NF 
Sbjct: 120 RTIYDNRFATIMLAPAFIACIAFIIMPMIITVLVSLTNYSAPHHIPPKNLVDWVGLKNFI 179 

Query: 206 NI FSGRMAS -TFFPVLSWTLIWAWATVTNFFFGIILALLINTKGLKFKKMWRTI FVITM 264 

+F R+ S TF + WT++WA FAT+ FG +LAL + K + KK MR +F++ 
Sbjct: 180 TLFELRIWSKTFVGIGVWTVLWAFFATLCTCSFGFLIALALENKKIIAKKAWRWFILPY 239 

Query: 265 AVPQFISLLIMRNLLSDAGPVNALLIKWGLISSAHPLPFLSDPWAKFSIIFVNMWVGIP 324 

A+P F++LLI R LL+ GPVN+ L WG+ S + FLSDP+ AK ++I V++WVG P 
Sbjct: 240 AIPAFVTLLIFRLLLNGIGPVNSTLNSWGIDS IGFLSDPLIAKMTVIAVSVWVGAP 295 

Query: 325 VTMLVATGI IMNLPAEQIEAAEIDGANKFQVFQS ITFPQILLIMTPTLIQQFIGNINNFN 384 

ML+ TG + N+P + EA+E+DGA+KFQ F+ IT P +L + P+L+ F N NNF 
Sbjct: 296 YFMLLITGAMTNIPRDLYEASEVDGASKFQQFREITLPMVLHQVAPSLVMTFAHNFNNFG 355 

Query: 385 VIYLLTQSSPTNSTYYQAGSTDLLVTWLYI^TVTAADYNLASVVGILIFILSAVFSLLAY 444 

IYLLT+GGP N Y AG TD+L+TW+Y LT+ Y +ASV+ I+IF+ ++F++ + 
Sbjct: 356 AI YLLTEGGPINPEYRFAGHTD I LI TWI YKLTLDFQQYQIASVI S I I I FLFLS I FAIWQF 415 

Query: 445 TRTNSYKE 452 

R S+KE 
Sbjct: 416 RRMKSFKE 423 



A related DNA sequence was identified in S. pyogenes <SEQ ID 5073> which encodes the amino acid 
sequence <SEQ ID 5074>. Analysis of this protein sequence reveals the following: 



Possible 
•> Seems to h 
INTEGRAL 
INTEGRAL 



3 N-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 98 - 114 

Transmembrane 165 - 181 

Transmembrane 424 - 440 

Transmembrane 43 - 5 9 

Transmembrane 258 - 274 



09 



Transmembrane 311 - 



■ 275) 
• 246) 

■ 328) 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 5373 (Affirmative) ■ 
• Certainty=0. 0000 (Not Clear) < i 
■ Certainty=0 . 0000 (Not Clear) < i 



45 The protein has homology with the following sequences in the databases: 



, Gaps = 19/426 (4%) 

Query: 26 SSIIMGFANFANKQFIKGILFLISELIFLVAFVSQIIPAIRGLVTLGTQTQGMTTKTIDG 85 
S +IMG + +IKG +FL+ +++ +++ ++ ++ A +GL+TLGT Q T G 

15 SFLIMGATQLISGHWIKGSVFLLFQIV-VISNINLLLNATQGLITLGTVAQ TRSG 68 



Sbjct: 
Query: 

+1 

Sbjct: 69 FDI 



INIQVAVDGDNSMLMLIFGLASLIFCLVFAYIYWCNLKSARNLYLFKQKGQKIPSFKEDL 145 
+1 V GDNS+ ML+ G+ + IF ++YW N+K A+ Q SF E L 

•VAGDNSIFMLVEGWAFIFLFFSIFVYWLNIKDAQVCEKCHQ SFTEQL 119 



Query: 146 ATLTNGRFHMTLMAIPLIGVLLFTILPLIYMICIiAFTNFDH-NHLPPKSLFDWVGIANFG 204 

T+ + RF ++A I + F I+P+I + ++ TN+ +H+PPK+L DWVGL NF 
Sbjct: 120 RTIYDNRFATIMLAPAFIACIAFIIMPMIITVLVSLTNYSAPHHIPPKNLVDWVGLKNFI 179 

Query: 205 NVLSGPJ«I-AGTFFPIFSWTLIWAWATVTNFFFGIIIiALLINTKGLKWKKMWRTIFVITI 263 

+ R+ + TF I WT++WA FAT+ FG +LAL + K + KK WR +F++ 
Sbjct: 180 TLFELRIWSKTBVGIGVWr/LWAFFATLCTCSFGFLLALALENKKIIAKKAWRWFILPY 239 

Query: 264 AVPQFISLLIMRNLLNDEGPLNALLl^IKIGLINGSLPFLSDPLWAKFSIIFVNMWIGIPFT 323 
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A+P F++LLI R LLN GP+N+ LN G+ S+ FLSDPL AK ++I V++W+G P+ 
Sbjct: 240 AIPAFVTLLIFRLI 1 LNGIGPVNSTIitJSWGI--DSIGFI,SDPLIAKMTVIAVSVWVGaPYF 297 

Query: 324 MLIATGIIMNLPSEQIEAAEIDGASKFQVFKSITFPQILLIMTPNLIQQFIGNINNFNVI 333 
5 ML+ TG + N+P + EA+E+DGASKFQ F+ IT P +L + P+L+ F N NNF I 

Sbjct: 298 MLLITGAMTNIPRDLYEASEVDGASKFQQFREITLPMVLHQVAPSLVMTFAHNFN1JFGAI 357 

Query: 384 YLLTGGGPTNSEYYQAGTTDLl.TOaYKLTVTAADYmASVIGILIFTVSAIFSLLAYTR 443 
YLLT GGP N EY AG TD+L+TW+YKLT+ Y +ASVI I+IF +IF++ + R 
10 Sbjct: 358 YLLTEGGPINPEYRFAGHTDILITWIYKLTLDFQQYQIASVISI I IFLFLSIFAIWQFRR 417 

Query: 444 TASYKE 449 
S+KE 

Sbjct: 418 MKSFKE 423 

15 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 357/446 (80%) , Positives = 404/446 (90%) , Gaps = 2/446 (0%) 



Query: 11 MSLKEVFQKGDLATKLSFLIMGLAKLKNKQIVKGLLFLISEILFLITFVYQVIPAVKGLI 70 

+S+ E ++G KLS +IMG AN NKQ +KG4LFLISE++FL+ FV Q+IPA4+GL+ 
Sbjct: 10 ISVIEALKRGSITOIKLSSIIMGFANFANKQFIKGILFLISELIFLVAFVSQIIPAIRGLV 69 

Query: 71 SLGTQEQGMTTKTVDGIKIQVATQGD^ISMLMLIFGIASLIFCCVFAYIYWSNIKSAAHLL 130 

+LGTQ QGMTTKT+DGI IQVA GDNSMLKLIFGIAS^IFC VFAYIYW N+KSA +L 
Sbjct: 70 TLGTQTQGMTTKTIDGINIQVAVDGDNSMLMLI FGIASLI FCLVFAYI YWCNLKSARNLY 129 

Query: 131 TLKEEGREIPSFKKDIKSLTDGRFHMTLMSIPLIGViliFTILPIiVYMICLAFTNYDHHHL 190 

K++G++IPSFK+D+ +LT+GRFHMTLM+IPI.IGVLLFTILPL+YMICLAFrN+DHNHL 
Sbjct: 130 LFKQKGQKIPSFKEDLaTLTNGRFHMTI^IPLIGVLI^FTILPLIY^IICIAFTNFDHNHL 189 

Query: 191 PPKSLFDWGFANFGNIFSGRM&STFFPVLSWTLIWAVFATVTNFFFGIIIALLINTKGL 250 

PPKSLFDWVG ANFGN+ SGRMA TFFP+ SWTLIWAVFATVTNFFFGI ILALLINTKGL 
Sbjct: 190 PPKSLFDWGIANFGm/LSGRMAGTFFPIFSOTLIWAWATVTNFFFGIILALLINTKGL 249 

Query: 251 KFKKMWRTIFVITmVEQFISLLII>5R]^LSDAGPVlJALLIKWGLISSAHPLPFLSDPWA 310 

K+KKMWRTIFVIT+AVPQFISLLIMRNLL+D GP+NALL K GLI+ + LPFLSDP+WA 
Sbjct: 250 KWKKMWRTI FVITIAVPQFI SLLIMRNLIiNDEGPLTJALIiNKIGLINGS - - LPFLSDPLWA 307 



Query: 311 KFSIIFVNMWVGIPVTMLVATGIIMNIjPAEQIEAREIDGANKFQVFQSITFPQILLIMTP 370 

KFSIIFVNMW+GIP TML+ATGIIMNLP+EQIEAAEIDGA+KFQVF+SITFPQILLIMTP 
Sbjct: 308 KFSIIFVNMWIGIPFTMLIATGIIMNLPSEQIEAAEIDGASKFQVFKSITFPQILLIMTP 367 

Query: 371 TLIQQFIGNINNFNVIYLLTQGGPTOSTYYCAGSTDIiLVTWLYl^TVTAADYMlASVVGI 430 

LIQQFIGNINNFNVIYLLT GGPTNS YYQAG+TDLLVTWLY LTVTAADYNLASV+GI 
Sbjct: 368 NLIQQFIGNIIOTFNVIYLLTGGGPTNSEYYQAGTTDLLVTWLYKLTVTAADYNLASVIGI 427 



Query: 431 LIFILSAVFSLLAYTRTNSYKEGAAK 456 

LIF +SA+FSL1AYTRT SYKEGAAK 
Sbjct: 428 LIFTVSAIFSLLAYTRTASYKEGAAK 453 

50 

A related GBS gene <SEQ ID 8869> and protein <SEQ ID 8870> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -12.73 
55 GvH: Signal Score (-7.5): -6.04 

Possible site: 36 





»> Seems to have no N- terminal signal sequence 












ALOM program 


count: 7 value: -11.46 threshold: 


0.0 










INTEGRAL 


Likelihood =-11.46 Transmembrane 


427 


443 


417 


447) 


60 


INTEGRAL 


Likelihood = -9.87 Transmembrane 


99 


115 


96 


121) 




INTEGRAL 


Likelihood = -9.39 Transmembrane 


166 


182 


154 


185) 




INTEGRAL 


Likelihood = -6.21 Transmembrane 


259 


275 


257 


276) 




INTEGRAL 


Likelihood = -6.21 Transmembrane 


229 


245 


223 


247) 




INTEGRAL 


Likelihood = -6.10 Transmembrane 


44 


60 


40 


66) 


65 


INTEGRAL 


Likelihood = -4.51 Transmembrane 


314 


330 


312 


331) 
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PERIPHERAL Likelihood = 0.90 212 
modified ALOM score: 2.79 



Reasoning Step: 3 

- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 5585 (Affirmative) < 

- Certainty=0 . 0000 (Not Clear) < £ 

- Certainty=0. 0000 (Not Clear) < e 



The protein has homology with the following sequences in the databases: 

ORF01027(379 - 1656 of 1968) 

EGAD | 33392 | 34706 (15 - 423 of 427) cymF protein {Klebsiella oxytoca} 
GP|854233|emb|CAA60005.l| |X86014 cymF {Klebsiella oxytoca} PIR | S63615 1 S63615 malF protein 
15 homolog cymF - Klebsiella oxytoca 

%Match =23.8 

%Identity = 41.3 %Similarity =64.5 

Matches = 171 Mismatches = 140 Conservative Sub.s = 96 

20 132 162 192 222 252 282 312 342 

VLLFIAILTWKSNI1AITIjNV*NNSIKTSLKQNS 



ML 



... IiATK] -b'LIMC4IiANLKNKQIWGLLFLISEIL?LITFVYQVIPAVKGLISLGTQEQGMTTKTvr)GIKIQVATQ 
HUM I = = =11 =11= =1 =1= = == I =111=111 I 111 
LSEGKSMRIFPASFLIMGATQLISGHWIKGSVFLLFQI-WISNINLLLWATQ3LITLGTVAQ TRSGFDI VA 



GDNSMLMLIFGLASLIFXCVFAYIYWSNIXSAAHLLTLKEEGREIPSFKKDIKSLTDGRFHMTLMSIPLIGVLLFTILPL 
||||.:||. |= ==|| :=|| || = = || = ==== I || === =1 = I |=|= 

GDNSIFMLVEGWAFIFLFFS IFVYWDNI KDAQ VCEKCHQS F7EQLRTIYDKRFATIMIAPAFIACIAFI IMPM 



852 879 909 939 966 996 1026 1056 

VYMICLAFTira)H-imLPPKSLFDWGFANFGNIFSGRMAS-TFFPVLSOTLIWAVFATraiFFFGII]MLIOT , KGLKF 
= = = = = 111 =1 = 111 = 1 1111= II =1 1= I II = ll = = ll 111= II =111 ■■ I = 

IITVLVSLTNYSAPHHIPPKNLvDWGLKNFITLFE^ 



1086 1116 1146 1176 1206 1236 1266 1296 

KKMWRTIFVITmWQFISLLIMRNLLSDAGPWALLIKHGLISSAHPLPFLSDPWAKFSIIFVNMWVGIPVTMLVATG 
45 || || :|:= |:| |::||| | ||: |||h I II = 11111= II ==l l==lll I 11= II 

KKAWRWFILPYAIPAFVTLLIFRLLLNGIGPVNSTLNSWG IDS IGFLSDPLIAKMTVIAVS VWVGAPYFMLLITG 

240 250 260 270 280 290 300 



1326 1356 1386 1416 1446 1476 1506 1536 

II^TOPAEQIEAAEIDGANKFQVFQSITFPQILLIMTPTLIQQFIGNIHNFm'IYLLTQG«PTOSTYYQAGSTDLLVTWL 
= 1=1 = 11=1=111=111 1= 11=1 =1 = 1=1= I I III 11111=111 I I II 11=1=11= 
AMTNIPFJDLYEASEVDGASKFQQFREITLPMVLHQVAPSLVMTFAHNFNNFGAIYLLTEGGPINPEYRFAGHTDILITWI 
320 330 340 350 360 370 380 



1566 1596 1626 1656 1686 1716 1745 1776 

YNLTVTAADYNLASWGILIFILSAVFSLLAYTRTNSYKEGAAK* * IRKNVLTLLLFIFY**YYQLCGSFPLFGSFSQAS 
I 11= I =111= l=ll== :=!== = I 1=11 

YKLTLDFQQYQI ASVI S 1 1 1 FLFLS IFAIWQFRRMKS 7KEDVGM 
400 410 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1840- 

Example 1641 

A DNA sequence (GBSxl736) was identified in S.agalactiae <SEQ ID 5075> which encodes the amino 
acid sequence <SEQ ID 5076>. This protein is predicted to be maltose/maltodextrin-binding protein 
precursor. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.98 Transmembrane 25 - 41 ( 24 - 43) 

Final Results 

bacterial membrane Certainty=D. 2593 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9999> which encodes amino acid sequence <SEQ ID 
10000> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



t- A+L +VA G+ A 





15 


Sbjct: 


3 




75 


Sbjct: 


63 




134 


Sbjct: 


121 




194 


Sbj ct : 


179 


Query: 


236 


Sbjct: 


239 


Query: 


293 


Sbjct: 


293 






Sbjct: 





+DT +SL A GK Y P IBS V+YYNK L D K+4 



A K A + GVA P 



PANSSIQSSDSVQKDEIAKAVIEMGSSDKYTTVMPKLSQMSTFWTESAAILSDTYSGK 410 
PAN+ +S + DEL AVI+ K T +P +SQMS W + +L D SG+ 

PANTEARSYAEGKNDELTTAVIK- - -QFKNTQPLPNISQMSAVWDPAKNMLFDAVSGQ 399 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5077> which encodes the amino acid 
sequence <SEQ ID 5078>. Analysis of this protein sequence reveals the following: 

Possible site: 28 



> May be a lipoprotein 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 
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>GP:AAA26925 GB:L08611 MalX [Streptococcus pneumoniae] 
Identities = 126/423 (29%), Positives = 191/423 (44%), Gaps = 50/423 (11%) 

Query: 13 SLTLASTLLVGCGSGSICDK- - KEAGADSKTI KMtfVPTGSKKSYADTIAK- FEKDSGYTVK 69 

++TLAS LLV CGS + DK ++ K + ++V G KSY + +AK +EK++G V 

Sbjct: 14 TVT1ASLLLVACGSKTADKPMSGSSEVKELTVYVDEG-YKSYIEEVAKAYEKEAGVKVT 72 

Query: 70 WESEDPKRQEKIKKD--ASTAADVFSLPHDQLGQLVESGTIQEVPEKYNKEIAATSTDQ 127 

+ + +K+ D + DV P+D++G L G + EVK+ T4- 

Sbjct: 73 LKTGDALGGLDKLSLDNQSGWPDVMMAPYDRVGSLGSDGQLSEV--KLSDGAKTDDTTK 130 

Query: 128 ALVGAQYKGKTYAFPFGIESQVLFYNKSKLAAEDVTSYD TITTKATFGGTFKQ 180 

+LV A GK Y P IES V++YNK + T D +K F G + 

Sbjct: 131 SLVTAA-NGKVYGAPAVIESLVMYYNKDLVKDAPKTFADLENLAKDSKYAFAGEDGKTTA 189 

Query: 181 ANTYATGPLFMSVGNTLFGEKGEDVKGTNWGNEKGAAVL KWIADQAS 227 

N Y T L G -t-FG+NG+D K N+ A + KW 

Sbjct: 190 FLADWTNFYYTYGLIAGNGAWFGQNGKDAKDIGLAlSnDGSIAGINYAKSWYEKWPKGMQD 249 

Query: 228 NKGFVSLDANNVMSKFGDGSVASFESGPt'IDYEAAQKAIGKEEILGVAIYPKVTIGGETVQQ 287 

+G N + ++F +G A+ GPW +A + A K N GVA P + G E 
Sbjct: 250 TEG AGNLIQTQFQEGKTAAIIDGPWKAQAFKDA- -KVNYGVATIPTLPNGKE Y 300 

Query: 288 KAFLGVKLYAVNQAPAKGDTKRIAASYKLASYLTNAESQENQFKTRNIVPANKEVQSSEA 347 

AFGK + + QA K + ASK +L EQ++ N +PAN E +S 

Sbjct: 301 AAFGGGKAWVIPQA VKNLEASQKFVDF1.VATEQQKVLYDKTNEIPAMTEARSYAE 355 

Query: 348 VQSNELAKTVITMGSSSDYTWMPKLSQMGTFWTESAAILSDAFNG KIKENDYLTK 403 

+++EL VI + T +P 4SQM W + +L DA +G K ND +T 
Sbjct: 356 GKNDELTTAVIKQFKN TQPLPKISQMSAVWDPAKNMLFDAVSGQKDAKTAANDAVTL 412 

Query: 404 LQQ 406 

Sbjct: 413 IKE 415 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 278/415 (66%) , Positives = 334/415 (79%) , Gaps = 6/415 (1%) 

Query: 21 TWKKLLVSTAADSWAGGAIAATHSNSVD AASKTTIKLWVPTDSKASYKAIVKKFZ 76 

+W+K++V A+L++ A 4- S S D A TIKLWVPT SK SY + KF+ 

Sbjct: 3 SWQKVIVGGASLTL-ASTLLVGCGSGSKDKKEAGADSKTIKLWVPTGSKKSYADTIAKFE 61 

Query: 77 KENKGVTVKMIESNDSKAQENVKKDPSKAADVFSLPHDQLGQLVESGVIQEIPEQYSKEI 136 

K++ G TVK++ES D KAQE +KKD S AADVFSLPHDQLGQLVESG IQE+PE+Y+KEI 
Sbjct: 62 KTJS-GYTVKVVESEDPKAQEKIKKDASTAADVFSLPHDQLGQLVESGTIQEVPEKYNKEI 120 

Query: 137 AKNDTKQSLTGAQYKGKTYAFPFGIESQVLYYNKTKLTADDVKSYETITSKGKFGXQLKA 196 

A T Q4L GAQYKGKTYAFPFGIESQVL+YNK+KL A+DV SY+TIT+K FG K 
Sbjct: 121 AATSTDQALVGAQYKGKTYAFPFGIESQVLFYNKSKLAAEDVTSYDTITTKATFGGTFKQ 180 

Query: 197 ANSYOTGPXFLSVGDTLFGKSGEDAKGITWGNEAGVSVLKJ^IADQKKNDGFVNLTAENTM 256 

AN+Y TGP F+SVG+TLFG++GED KGTNWGNE G +VLKWIADQ N GFV+L A N M 
Sbjct: 181 ANTYATGPIiFMSVGNTLFGENGED\TCGTNWGNEKGAAVLKWIADQASNKGFVSLDANNVM 240 

Query: 257 SKFGDGSVHAFESGPWDYDAAKKAVGEDKIGVAWPTMKIGDKEVQQKAFLGVKLYAVNQ 316 

SKFGDGSV +FESGPWDY+AA+KA+G++ +GVA+YP + IG + VQQKAFLGVKLYAVNQ 
Sbjct: 241 SKFGDGSVASFESGPWDYEAAQKAIGKENLGVAIYPKVTIGGETVQQKAFLGVKLYAVNQ 300 

Query: 317 APAGSNTKRISASYKLAAYLTNAESQKIQFEKRHIVPANSSIQSSDSVQKDELAKAVIEM 376 

APA +TKRI+ASYKLA+YLTNAESQ+ QF+ R+IVPAN +QSS++VQ +ELAK VI M 
Sbjct: 301 APAKGDTKRIAASYKIASYLTNAESQENQFKTI^IVPANKEVQSSEAVQSNEIiAKTVITM 360 

Query: 377 GSSDKYTTVMPKLSQMSTFWTESAAILSDTYSC-KIKSSDYLKRLKQFDKDIAKTK 431 

GSS YT VMPKLSQM TFWTESAAILSD ++GKIK +DYL +L4QFDKDIA TK 
Sbjct: 361 GSSSDYTVVMPKLSQMGTFWTESAAILSDAFNGKIKENDYLTKLQQFDKDIAATK 415 
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SEQ ID 5076 (GBS649) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 132 (lane 2 & 3; MW 76kDa) and in Figure 186 (lane 7; MW 76kDa).. It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
132 (lane 7; MW 51kDa) and in Figure 178 (lane 8; MW 51kDa). 

GBS649-His was purified as shown in Figure 229, lane 8. Purified GBS649-GST is shown in Figure 245, 
lanes 6 &73. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1642 

A DNA sequence (GBSxl737) was identified in S.agalactiae <SEQ ID 5079> which encodes the amino 
acid sequence <SEQ ID 5080>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD02112 GB:AF03S082 putative maltose operon transcriptional 
repressor [Lactococcus lactis] 
Identities = 43/61 (70%) , Positives = 49/61 (79%) 

Query: 2 VTI KDVAAKAGVNPSTVSRVLKDNAS I S SKTKERVKKAMEELGYVPNVAAQMLASGLTQN 61 

VTIKDVA KAGVN STVSRV+KD++ IS KTK +V+KAM ELGY N AAQ+LASG T 
Sbjct: 3 VTIKDVAKKAGVNASWSRVIKDSSEISDKTKVKVRKAMHELGYRRNAAAQILASGKTNT 62 

Query: 62 I 62 
I 

Sbjct: 63 I 63 

A related DNA sequence was identified in S.pyogems <SEQ ID 5081> which encodes the amino acid 
sequence <SEQ ID 5082>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

o N-terminal signal sequence 

-3.93 Transmembrane 259 - 285 ( 266 - 287) 

Final Results 

bacterial membrane Certainty=0 .2572 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 53/62 (85%) , Positives = 57/62 (91%) 

Query: 1 MVTIKDVAAKAGWPSWSRVLKDNASISSKTKERVKKAMEELGWPNVAAQMIASGLTQ 60 

MVTTKDVA KAGVNPSTVSRVLKDN SIS KTKE+V+KAM +LGYVPNVAAQ+LASGLT 
Sbjct: 26 MVTI KDVAQKAGWPS WSR VLKDNRS I SMKTKEKVRKAMADLGYVPNVAAQI LASGLTH 85 

Query: 61 NI 62 
NI 

Sbjct: 86 NI 87 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for' 
vaccines or diagnostics. 

Example 1643 

A DNA sequence (GBSxl738) was identified in S.agalactiae <SEQ ID 5083> which encodes the amino 
acid sequence <SEQ ID 5084>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.70 Transmembrane 14 - 30 ( 8 - 34) 

INTEGRAL Likelihood = -6.90 Transmembrane 66 - 82 ( 63 - 85) 

INTEGRAL Likelihood = -6.69 Transmembrane 110 - 126 ( 105 - 128) 

INTEGRAL Likelihood = -3.93 Transmembrane 132 - 148 ( 129 - 149) 



Final Results 

bacterial membrane Certainty=0 .4079 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9443> which encodes amino acid sequence <SEQ ID 9444> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 



FGWDSAFFIMIINIPLLLLCYFGL3KQTFLKTVYGSWIFPVFIKLTQSVPTLTHNPLLAA 68 
+G+++A+- IINIPL 4- LG + LKT+ GS P+ + LT+ + TH+ LLAA 

YGFEAAYVQWI INIPLF IAGVI LLSGKFGLKTLAGSVFLPLWF jTRDIQPATHHELLAA 111 



Ident: 






9 


Sbj ct : 


52 




69 


Sbjct: 


112 




129 


Sbjct: 


172 




189 


Sbjct: 


232 



+FGGV +G G+GIV+ STGGT + Q -4- KY+ +SLG+ + +ID3++ 4 



DRGVT+I GG 



40 A related DNA sequence was identified in S.pyogenes <SEQ ID 5085> which encodes the amino acid 
sequence <SEQ ID 5086>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.21 Transmembrane 104 - 120 ( 101 - 123) 
45 INTEGRAL Likelihood = -3.93 Transmembrane 147 - 163 ( 142 - 167) 

INTEGRAL Likelihood = -3.29 Transmembrane 169 - 185 ( 169 - 186) 



Final Results 

bacterial membrane Certainty=0. 3484 (Affirmative; 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) ■ 

The protein has homology with the following sequences in the databases: 



Query: 7 DLLLVTIGSFITAIGFNTMFVDNHIASGG^GIAWIKALFGISPSLFLMASNIPLLLMC 66 
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Query: 67 YFFLGKQNFIKTLYGSWIYPIAIRSTNSLPTLTHNQLLAAIFGGIICGIGLGMVFWGNSS 125 

LG + +KTL GS P+ + T + TH++LLAAIFGG+ GIG+G+V+ G S 
Sbjct: 72 VILLGGKFGLKTLAGSVFLPLWFLTRDIQPATHHELLflRIFGGVGIGIGIGIVYLGKGS 131 

Query: 127 TGGTGILTQILHKYSPLSLGVAMTIVDGISVLMGFIALSADDVMYSTIGLFVIGYVISVM 186 

TGGT + QI+HKYS LSLG + I+DG+ V+ I + + +Y+ +G++V I V+ 
Sbjct: 132 TGGTAIAAQIIHKYSGLSLGKCIAIIDGMIWTAMIVFNIEQGLYAMLGVYVSSKTIDW 191 

Query: 187 ENGFDSSKMVMIISKDYQAIREYITTVMDRGVTKLPIRGGYTTSDKIMLMAIVSSHELPT 246 

+ GF+ SK +II+K QA++E 4- +DRGVTK+ GGYT D+ +LM +V E 
Sbjct: 192 QVGFNRSKMALIITKQEQAVKEAVLQKIDRGVTKISAVGGYTDDDRPILMCWGQTEFTK 251 

Query: 247 LQEKILEIDDTAFIWMPAAQVMGRGF 273 

L++ + +ID++AF++V A++V+G GF 
Sbjct: 252 LKQIVKQIDESAFVIVADASEVLGEGF 278 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 135/252 (53%), Positives = 190/252 (74%) 

Query: 1 MAVSFHEVFGWDSAFFIMIINIPLLLLCYFGLGKQTFLKTVYGSWIFPVFIKLTQSVPTI, 60 

+AV +FG + F+M N1PLLL+CYF LGKQ F+KT+YGSWI+P+ 1+ T S+PTL 
Sbjct: 39 IAWIKALFGISPSLFLMASNIPLLLMCYFFLGKQNFIKTLYGSWIYPIAIRSTNSLPTL 98 

Query: 61 THNPLIAALFGGVIVGCGLGIVFWSDSSTGGTGIIIQFLGKYTPISLGQGVILIDGLVTI 120 

THN LLAA+FGG+I G GLG+VFW +SSTGGTGI+ Q L KY+P+SLG + ++DG+ + 
Sbjct: 99 THNQLIAAIFGGIICGIGLGMVFWGNSSTGGTGILTQILHKYSPLSLGVAMTIVDGISVL 158 

Query: 121 VGFIAFDSDTVMFSIIGLITISYIINAIQJTGFTTLSTVLIVSQEHQKIKTYINTVADRGV 180 

+GF+A +D VM+S IGL I Y+I+ ++ GF + V+I+S+++Q 1+ YI TV DRGV 
Sbjct: 159 MGFIALSADDVMYSTIGLFVIGWISVMENGFDSSKNVMIISKDYQAIREYITTVMDRGV 218 

Query: 181 TEIPVKGGYSGTNQIMLMTTIAGYEFAKLQEAIAEIDETAFITVTPTSQASGRGFSLQKN 240 

T++P++GGY+ +++IMLM ++ +E LQE I EID+TAFI V P +Q GRGFSL K 
Sbjct: 219 TKLPIRGGYTTSDKIMLMAIVSSHELPTLQEKILEIDDTAFIWMPAAQVMGRGFSLTKQ 278 

Query: 241 HGRLDEDILMPM 252 

+ R D+D+L+PM 
Sbjct: 279 YKREDKDVLLPM 290 

A related GBS gene <SEQ ID 8871> and protein <SEQ ID 8872> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 1.57 
GvH: Signal Score (-7.5): -2.56 

Possible site: 56 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 4 value: -7.70 threshold: 0.0 

INTEGRAL Likelihood = -7.70 Transmembrane 14- 30 ( 8- 34) 

integral Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



* Reasoning Step: 3 



90 Transmembrane 66 - 82 ( 63 - 
69 Transmembrane 110 - 126 ( 105 - 
132 - 148 ( 129 - 



Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) ■ 

The protein has homology with the following sequences in the databases: 
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-1845- 

ORF02139(113 - 1008 of 1356) 

OMNI|HT01BS4111(51 - 325 of 327) conserved hypothetical protein 
%Match =19.3 

%Identity =37.1 %Sirailarity =62.1 
5 Matches = 101 Mismatches = 99 Conservative Sub.s = 68 

27 57 87 117 165 

ARAIPSFIVGSALTGALVGIiAGIKLMAPHGGIFVIALTSNPLLYIL FILIGAWSGVLFGLF 

I 1 = 11 :||||| :: II --I 

10 VCFFISYILDFTAALAYYHCIWVLFTSNC£RIKMLSESIC^ 

10 20 30 40 50 60 70 80 

216 246 276 306 336 366 396 426 

RKIK*LISTYPMjH*IKGE*XIVILXXLIN*XXGGISGIAVSFXEVFGWDSAFFIMIINIPLLLLCYFGLGKQTFI:KTVY 
15 || ||:||:: : :|:::|: ||||:: || : |||: 

NKI AAGGVSGIST- ILQSYGFEAAYVQWI INI PLFIAGVI LLGGKFGIiKTLA 

90 100 110 120 130 

456 486 516 546 576 606 636 666 

20 GSWIFPVFIKLTQSVPTLTHNPLLAALFGGVIVGCGLGIVFWSDSSTGGTGIIIQFLGKYTPISLGQGVILIDGLVTIVG 
II = h : ||: = lb lllhllll = I 1 = 111= Mill = I = 11= =111= = Mll = = = 

GSVFLPLWFLTRDIQPATHHEL1AAIFGGVGIGIGIGIVYLGKGSTGGTALAAQIIHKYSGLSLGKCLAIIDGMIVVTA 
150 160 170 180 190 200 210 

25 696 726 756 786 816 846 876 906 

FIAFDSDTVMFSIIGLITISYIimiQTGFTTLSTVLIVSQEHQKIKTYINTVADRGVTEIPVKGGYSGTNQIMLMTTIA 

= |= . -...::|s I 1= M M II"" I =1 = 11111*1 III' = = Ml « 

MIVFNIEQGLYAMLGVYVSSKTIDWQVGFNRSKMALIITKQEQAVKmVLQKIDRGVTKISAVGGYTDDDRPILMCWG 
230 240 250 260 270 280 290 

30 

936 966 996 1026 1056 1086 1116 

GYEFAKLOEAIAEIDETAFITVTPTSOASGRGFSLQKNHGRLDEDILMPM*SIDN*SFF**NSR*NIHKR*QNC 

M H== = =111=11= I 1= I II 
QTEFTKLKQIVKQ1DESAFVIVADASEVLGEGFKRA 
35 310 320 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1644 

40 A DNA sequence (GBSxl739) was identified in S.agalactiae <SEQ ID 5087> which encodes the amino 
acid sequence <SEQ ID 5088>. This protein is predicted to be ABC transporter, ATP-binding protein 
(b0820). Analysis of this protein sequence reveals the following: 
Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0. 3122 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24918 GB:AF012285 YkpA [Bacillus subtilis] 
Identities = 355/540 (65%) , Positives = 451/540 (82%) , Gaps = 4/540 (0%) 

55 Query: 1 MLTVSDVSLRFSDRKLFDEVNINFTAGNTYGLIGANGAGKSTFLKILAGDIEPTTGHIAL 60 

M+ V++VSLRF+DRKLF++VNI FT GN YGLIGANGAGKSTFLK+L+G+IEP TG + + 
Sbjct: 1 MIAVOTWSLRFADRKIFEDOTIKBTPGNCTGLIGAM^GKSTFLKVLSGEIEPQTGDVHM 60 

Query: 61 GPDERLSVLRQNHFDYEDERVIDWIMGNETLYSIMKEKDAIYMKEDFSDEDGVRAAELE 120 
60 P ERL+VL+QNHF+YE+ V+ WIMG++ LY +M+EKDAIYMK DFSDEDG+RAAELE 

Sbjct: 61 SPGERIAVLKQNHFEYEEYEVLKVVlMGHKRLYEvMQEKDAIYMKPDFSDEDGIRAAEIiE 120 
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-1846- 



Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 


Query: 


419 


Sbjct: 


420 


Query: 


479 


Sbjct: 


480 



EPTN LD+Q+I WLE+FLI+PENTVIW3HDRHFLNKVCTH+ADLDF KI+++VGNYDFW 



S+IKQLQEFVRRFSMASKSKQATSRKK+L+KI L4-+I PS 



SR+YP+VNF ERE+GND+L VE L+ TIDG K+LDN+SFI+ DK A G+N++ T 



^ G4+E 4 GT KWGVTTS++Y PKDNS F + ++++WLRQ+ S 



RGFLGRMLFSG+EV+K NVLSGGEKVR MLSK ML +N+L+LD+PTNHLDLESI++LM 



+GL FK +4+F SHDH+F+QT+AN II ++ NG++D+ +YDEFLEN +VQ K+ +L+ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5089> which encodes the amino acid 
sequence <SEQ ID 5090>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3124 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 497/539 (92%) , Positives = 525/539 (97%) 

MLTVSDVSLRFSDRKLFDEVNINFTAGNTYGLIGANGAGKSTFLKILAGDIEPTTGHIAL 6 0 
+LTVSDVSLRFSDRKLFD+VNI FTAGNT YGL I GANGAGKSTFLKI LAGD I EP+TGH I +L 
LLTVSDVSLRFSDRKLFDDVNIKFTAGNTYGLIGANGAGICSTFLKILAGDIEPSTGHISL 6 0 



G FAELGGWEAESEASQLLQNLNI E+LEYQI-mSEIANGDKVKVLLAKALFGKPDVLLLD 



EPraGLDIQSI+V^EDFLIDFENWIWSKDRHFLNKVCTEiMADLDFGKIKLFVGNYDFW 



K+SSELAARLQADRNAKAEEKIK+LQEFVARFSANASKSKQATSRKKMLiDKIELEEIVPS 



SRKYPF+NFKAEREMGND LTVENLSVTIDGEKI+DNISFILRPGDK A+IGQNDIQTTA 
SRKYPFINFKAEREMGNDFLTVENLSVTIDGEKI IDNISFILRPGDKAAI IGQNDIQTTA 360 







1 


45 


Sbjct: 


1 




Query: 


61 




Sbjct: 


61 


50 








Query: 


121 




Sbjct: 


121 


55 




181 




Sbjct: 


181 






241 


60 








Sb j ct : 


241 






301 


65 


Sbj Ct : 


301 
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-1847- 



Query: 361 LIRALMGDIEYEGTIKWGVTTSRSYLPKmSRDFASGESILEWLRQFASKEEDDNTFLRG 420 
L+RAL DI+YEGTIKWGVTTSRSYLPKDNS+DFA+ ESILEWLRQFASK EDD+TFLRG 

Query: 421 FLGRMLFSGDETOKSVNVLSGGEKWVMLSKLMLLKSWLVLDDPTHHLDLESISSliNDG 430 

FLGRMLFSGDEV KSVNVLSGGEKVRVMLSKLMLLKSNVL+LDDPTNHLDLESISSL1IDG 
SbjcC: 421 FLGRMLFSGDEVKKSVimiSGGEKWVMLSKLMLLKSNVLILDDPTNHLDLESISSLNDG 480 

Query: 481 LKDFKESIIFASHDHEFIQTLANHIIVLSKNGVIDRIDETYDEFLENTEVQAKVAQLWK 539 

+KDFKES+IFASHDHEFIQT4ANHI+V+SKNGVIDRIDETYDEFL+N EVQA+VA+LWK 
Sbjct: 481 IKDFKESVIFASHDHEFIQTIANHIWISKNGVIDRIDETYDEFLDNPEVQARVAELWK 539 

Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1645 

A DNA sequence (GBSxl740) was identified in S.agalactiae <SEQ ID 5091> which encodes the amino 
acid sequence <SEQ ID 5092>. Analysis of this protein sequence reveals the following: 



Possible site 

INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



ave an uncleavable N- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



:erm signal seq 
Transmembrane 14 ■ 
Transmembrane 384 - 



Transmembrane 163 
Transmembrane 322 
Transmembrane 297 



Transmembrane 43 8 



Transmembrane 232 
Transmembrane 832 
Transmembrane 200 



30 ( 8 - 

400 ( 382 - 

428 ( 408 - 

179 ( 155 - 

338 ( 320 - 

313 ( 290 - 

376 ( 357 - 

454 ( 437 - 

152 ( 136 ■ 

126 ( 106 - 

248 ( 232 - 

848 ( 832 - 

216 ( 200 - 



- Final Results - 



bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4885 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < ! 

- Certainty=0. 0000 (Not Clear) < £ 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC14608 GB:U95840 transmembrane protein Tmp5 [Lactococcus 
lactis] 

Identities = 140/260 (53%) , Positives = 182/260 (69%) , Gaps = 6/260 (2%) 



Query: 


16 


Sbjct: 


14 


Query: 


74 


Sbjct: 


74 




134 


Sbjct: 


134 


Query: 


194 


Sbjct: 


194 




254 



IYWGS +ILA D +HQYV + RNILH GS 



GLGLN YA S+YY+GSFL P +FF++K+MPDA+YL TI K GLIGLS FV+ 



^ +L 1ST ++LMSF SQ+EI MWLDVFIL+PL++ G+ H 



FIQNYYFGFM AIF LYF 



V ERK LYF+SL L 
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Sbjct: 251 LKSNNSDALSTLSGIFTENS 270 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5093> which encodes the amino acid 
sequence <SEQ ID 5094>. Analysis of this protein sequence reveals the following: 



Possible site: 51 
• Seems to have an uncleavable N- 
INTEGRAL Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 

- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 



■term signal seq 
Transmembrane 
Transmembrane 2 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



31 ( 6 - 

• 217 ( 196 - 

■ 425 ( 402 - 

• 24S ( 227 - 

• 177 ( 153 - 

■ 307 ( 290 - 
149 ( 130 - 

• 395 ( 376 - 

• 121 ( 103 - 

• 848 ( 830 - 
452 ( 435 - 

■ 334 ( 314 - 

■ 372 ( 355 - 372, 

■ 95 ( 80 - 



- Certainty=0. 4715 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < ; 

- Certainty=0. 0000 (Not Clear) < i 




The protein has homology with the following sequences in the databases: 

>GP:AAC14608 GB:U95840 transmembrane protein Tmp5 [Lactococcus lactis] 
Identities = 134/269 (49%) , Positives = 183/269 (67%) , Gaps = 8/269 (2%) 

30 





5 


NKWIIAGLASFLFPLSIIFIILLSKGIYYNSDKTILASDAFHQYVIFAQNFRNIMH--GS 


62 






NKW + LASF PL ++ I+L GIY+ S ++ILA DA+HQYV +RNI+H GS 




Sbjct: 


7 


NKWAL--LASFFIPLILMVIVLAMTGIYHGSSRSILAGDAYHQYVAIHSLYRNILHSGGS 


64 




63 


DSFFYTFTSGLGINFYALMCYYLGSFFSPLLFFF^LTSMPDAIYLFTLIKFGLIGLAACY 


122 






F YTFTSGLG+N YA YY+GSF P FFF++ SMPDA+YLFT+IKFGLIGL++ 




Sb j Ct : 


65 


QGFLYTFTSGLGLNLYAFSAYYMGSFLMPFTFFFDVKSMPDALYLFTIIKFGLIGLSSFV 


124 


Query: 


123 


SFHRLYPKISAFLMISISVFYSLMSFLTSQMELNSWLDVFILLPLVILGLNKLITENKTR 


182 






SF +Y K+S ++SIS ++IMSFLTSQ+E+ WLDVFILLPL+I GL++L+ E K 




Sbjct: 


125 


SFKNMYQKLSNLTVLS I STAFALMS FLTSQLE I TMWLDVF ILLPLI IWGLHRLMDERKRW 






183 


TYYLSISLLFICJIYYFGYMIALFCILYALVCLLRI^FNKMFIAFVRFTAVSICAALTSA 242 






Y++S+ +LFIQNYYFG+M+A+F +LY L R+ 4 + F S A + S 




Sb j ct : 


185 


LYFVSLLILFIQNYYFGFMVAIFLVLYFLA- - -RMTYEKWSWTKVLDFWSSTLAGIASL 


241 




243 


LVI LPTYLDL- ST YGENLS P I KQLVTNNA 270 








+++LP YLDL S + LS + + T N+ 




Sbjct: 


242 


IMLLPMYLDLKSNNSDALSTLSGIFTENS 270 





50 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 432/83S (51%) , Positives = 569/836 (67%) , Gaps = 2/836 (0%) 

Query: 16 SFLLPFIIIVCILFTKNIYWGSPTTILASDGFHQYVIFNQALRNILHGSNSLFYTFTSGL 75 
55 ' SFL P II IL + IY+ S TILASD FHQYVIF Q RNI+HGS+S FYTFTSGL 

Sbjct: 14 SFLFPLSIIFIILLSMGIYYNSDKTILASDAFHQYVIFAQNFRNIMHGSDSFFYTFTSGL 73 

Query: 76 GLNFYALSSYYLGSFLSPIWFFNLKNMPDAIYI^TICKIGLIGLSMFvTLCKRHCJCVNR 135 
G+NFYAL YYLGSF SP+++FFNL +MPDAIYL T+ K GLIGL+ + + + K++ 
60 Sbjct: 74 GINFYALMCYYLGSFFSPLLFFFNLTSMPDAIYLFTLIKFGLIGLAACYSFHRLYPKISA 133 

Query: 136 VLLLVISTCYSLMSFSISQIEINMWLDVFILIPLWLGVDQLLWERKPILYFLSLTALFI 195 

L++ IS YSLMSF SQ+E+N WLDVFIL+PLV+LG+++L+ E K Y+LS++ LFI 
Sbjct: 134 FLMISISVFYSLMSFLTSQMELNSWLDVFILLPLVILGLNKLITENKTRTYYLSISLLFI 193 



WO 02/34771 



PCT/GB01/04789 



Query: 


196 


Sbjct: 


194 


Query: 


256 


Sbj ct: 


254 


Query: 


316 


Sbjct: 


314 


Query: 


376 


Sbjct: 


374 


Query: 


436 


Sbjct: 


434 


Query: 


496 


Sbjct: 


494 


Query: 


556 


Sbjct: 


554 


Query. 


616 


Sbjct: 


614 


Query: 


675 


Sbjct: 


674 




735 


Sbjct: 


733 




795 


Sbjct: 


793 



QNYYFG+M A+F LY +V + R D F F+ FT +S+ A +TS+++ILPTY DL+ 



AK IG YDTTKF ++PMIYVGL PL+LS++YFT++ 



(■ Y++L L L LL++ L Y I SF + Ql 



F++ F +LE LNT+YQL +N E 



FFR ER L QTGNDSMK+NY GISQFSS+RNR SS +LDRLGF+S GTNLNLRYQNNT+I 



DSL G+KYNL+E P +KFGF K T LY+N ++S LAILT VY+D 



NQT LLNQLSG TYF +SG N Q+ + + Q + + Y I IPK+SQL 



YVS+P I F+N + K ++I +N F+ 



+SF PHFY L +E+Y +AM I ++ V N VI +Y S + Sh FT+PYD+GW 



+ + KAQ GF+ + IPKGKG+V It FIP GFK G+ LS GI+ + + Y 
[LPVKKAQGGFLSVTIPKGKGRVILTFIPNGFKLGLSLSCVGIIAYMLLY 848 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1646 

A DNA sequence (GBSxl741) was identified in S.agalactiae <SEQ ID 5095> which encodes the amino 
acid sequence <SEQ ID 5096>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4624 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT 

>GP:AAC45340 GB:AF000658 0RF1 [Streptococcus pneumoniae] 
Identities = 111/159 (69%) , Positives = 136/159 (84%) 



PCT/GB01/04789 



Query: 1 MKLKIITVGKLKEKYLKEGVAEYQKRLNRFSKIETIELADEKTPDKASISENQRILDIEG 60 

MK+K++TVGKLKEKYLK+G+AEY KR++RF+K E IEL+DEKTPDKAS SENQ+IL+IEG 
Sbjct: 1 MKIKWTVGKLKEKYLKDGIAEYSKRISRFAKFEMIELSDEKTPDKASESENQKILEIEG 60 

Query: 61 ERILSKIGERDyVIGLRIEGKQLPSESFSHLIDQKMISGYSTITFVIGGSLGLSQKVKKR 120 

+RILSKI +RD+VI LAIEGK SE FS +++ I G+ST+TF+IGGSLGLS VK R 
Sbjct: 61 QRILSKIADRDFVIVIAIEGKTFFSEEFSKQLEETSIKGFSTLTFIIGGSLGLSSSVKNR 120 

Query: 121 ADYLMSFGLLTLPHQLMKLVLMEQIYRAFMIRQGTPYHK 159 

A+ +SFG LTLPHQLM+LVL+EQIYRAF I+QG PYHK 
Sbjct: 121 ANLSVSFGRLTLPHQLMRLVLVEQIYRAFTIQQGFPYHK 159 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5097> which encodes the amino acid 
sequence <SEQ ID 5098>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4462 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 112/159 (70%), Positives =.133/159 (83%) 

Query: 1 MKlKIITVGKlKEKYLKEGVAEYQKRLNRFSKIETIELADElCrPDKASISENQRILDIEG 60 

MK+K+I VGKLKE+YLK+G++EYQKRL+RF + E IEIi DE+TPDKAS ++NQ 1+ E 
Sbjct: 1 MKVKLICTGKLKERYLKDGISEYQKRLSRFCQFEMIELTDERTPDKASFADNQLIMSKEA 60 

Query: 61 ERILSKIGERDYVIGLAIEGKQLPSESFSHLIDQKMISGYSTITFVIGGSLGLSQKVKKR 120 
+RI KIGERD+VI LAIEGKQ PSE+FS LI + GYSTITF+ IGGSLGL +KKR 

Query: 121 ADYLMSFGLLTLPHQLMKLVLMEQIYRAFMIRQGTPYHK 159 

A+ LMSFGLLTLPHQLM+LVL EQIYRAFMI QG+PYHK 
Sbjct: 121 ANMLMSFGLLTLPHQLMRLVLTEQIYRAFMITQGSPYHK 159 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



40 Example 1647 

A DNA sequence (GBSxl742) was identified in S.agalactiae <SEQ ID 5099> which encodes the amino 
acid sequence <SEQ ID 5100>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0 .3785 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1648 

A DNA sequence (GBSxl743) was identified in S.agalactiae <SEQ ID 5101> which encodes the amino 
acid sequence <SEQ ID 5102>. This protein is predicted to be a serine protease. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4533 (Affirmative) < suco 

10 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9445> which encodes amino acid sequence <SEQ ID 9446> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 



NDNI PNGGVTICrSKWNYNNITPTTIOWKIOTQNSVVSVINyKQQESRSDLSDPYSHFFGNQ 6 3 
N++ N +T+T+ Y N TT+AV KV+++WSVI Y S FGN 

■VFGND 94 



Query: 


4 


Sbjct: 


46 




64 


Sbjct: 


95 




124 


Sbj ct : 


154 


Query: 


184 


Sbjct: 


214 


Query: 


244 


Sbjct: 


266 




304 


Sbjct: 


326 




364 


Sbjct: 


38S 



EGSGVIYKK+ K AY+VTNNHVI+GA +++I+L+DG+K G++VG+DT 



+SD+AWKI S+KV+ +AEF DSSKL +GETAIAIGSPLG+EYAN+VTQGIVSSL R V 



+E+GQ +ST AIQTD AINPGNSGG LINI+GQVIGI SSKI++ 



EG+GFAIP+ND + II QLE NG+V RPALGI M LSN+ + I +L IPSNVT+G++V 



r S +DLQS LY H +GD+I +T+YR ++T -1 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5103> which encodes the amino acid 
sequence <SEQ ID 5104>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
50 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.76 Transmembrane 11 - 27 ( 6 - 31) 

Final Results 

bacterial membrane Certainty=0. 4503 (Affirmative) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 250/375 (66%) , Positives = 299/375 (79%) , Gaps = 5/375 (1%) 
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Query: 3 HMDNIPNGGVTKTSKOTnOTITPTTKAVKK^^SWSVINYKQQESRSDLSDFYSHFFGN 62 

H+ + KG TS + +NN T TTKAVK VQN+WSVINY+ S S LS+ Y+ FG 
Sbjct: 34 HSPSKINSGKATTSNMVFNNTTNTTKAVKRVQNAWSVINYQDNPS-SSLSNPYTKLFGE 92 

Query: 63 QGG--NIDKGLQVYGEGSGVIYKKDGKN2WVVTNHHVIDGAKQIEIQLADGSKAVGKLVG 120 

N D L ++ EGSGVIY+KDG +AYWTNNHVIDGAK+IEI +ADGSK VG+LVG 
Sbjct: 93 GRSKENKDAELSIFSEGSGVIYRKDGNSAYVVTNMHVIDGAKRIEILMADGSKVVGELVG 152 

Query: 121 SDTYSDLRWKIPSDKVSNIAEFADSSKLNIGETAIAIGSPLGTEYANSVTQGIVSSLKR 180 

+DTYSDLAWKI SDK+ +AEFADS+KLN+GE AIAIGSPLGT+YANSVTQGIVSSL R 
Sbjct: 153 ADTYSDIAWKISSDKIKTVAEFADSTKLNVGEVAIAIGSPLGTQYANSVTQGIVSSLSR 212 

Query: 181 TVTMTNEEGQTVSTNAIQTDAAINPGNSGGALINIEGQVIGINSSKISSTSNQTSGQSSG 240 

TVT+ NE G+TVSTNAIQTDAAINPGNSGG L1NIEGQVIGINSSKISST 4-+G S 
Sbjct: 213 TVTLKNENGETVSTNAIQTDAAINPGNSGGPLINIEGQVIGINSSKISSTPTGSNGNS-- 270 



Query: 361 TVTIKLTKTSKDLAK 375 

IKLTKT++DL K 
Sbjct: 391 KADIKLTKTTQDLTK 405 

A related GBS gene <SEQ ID 8873> and protein <SEQ ID 8874> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipcp: Possible site: -1 Crend: 10 
McG: Discrim Score: 12.68 
GvH: Signal Score (-7.5): -1.33 

Possible site: 21 
»> Seems to have a cleavable N-t 
ALOM program count: 0 value: 

PERIPHERAL Likelihood = 4.56 301 
modified ALOM score: -1.41 

• *** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

57.4/75.6% over 386aa 

Streptococcus 

b protease Insert characterized 
ORF02135(307 - 1506 of 1827) 

GP 1 2109443 | gb | AAC45334.1] |AF000658 (9 - 395 of 397) putative serine protease {streptococcus 

pneumoniae} 

%Match =34.5 

%Identity =57.3 %Similarity =75.6 

Matches = 223 Mismatches = 89 Conservative Sub.s = 71 

228 258 288 318 348 378 399 429 

RLSTSCGYFLFLAFKV*LRSLS*D*YKNLRR*LWKXKLVSSLLKCSLIIIVSFAGGAFASFVMNH NDNIPNGGVTK 

: = | : :: ::|::|| ||: j| : : :: | 

MESNMKHLKTFYKTOIFQLLWIVISFFSGALGSFSITQLTQKSSVNNSNNNS 
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T-SKVNYNNITPTTKAVKKVQNSWSVINYKQQESRS^ 

I :: II 11=11 ll===lllll I I III =11 = = llllllllh I 11=11 

TITQTAYKNENSTTCjAWKVKDAWSVITYSANRQNS VFGNDDTDTDS-QRISSEGSGVIYKKNDKEAYIVT 



NNHVIDGAKQIEIQLADGSKAVGIQliVGSDTYSDI^VVI<IPSDK\ r SNIAEFADSSI<L^IIGETAIAIGSPLGTEYANSVTQG 

11111 = 11 = = = 1 = 1 = 11 = 1 ]==ii=ii = ii=iiiii 1 = 11= =ni inn =iiiiiiiiiiii = iiii=iiii 

NNHVINGASCTDIRLSDGTKVPGEIVGADTFSDIAWKISSE^ 

140 ISO 160 170 180 190 200 

936 956 996 1026 1056 1086 1116 1146 

IVSSLKKTVTMTNEEGQTVSTNAIQTDAAINPGNSGGALINIEGQVIGIN3SKISSTSMQTSGQSSGNSVEGMGFAIPSN 

mil i i = = =i = ii =n inn nmmi 1111 = 111111 1111= 1 = 1 1 1111 = 11111 = 1 

IVSSLNRNVSLKSEDGQAISTKAIQTDTAINPGNSGGPLINIQGQVIGITSSKIA TNG---GTSVEGLGFAIPAN 

220 230 240 250 260 270 



i = ii iii 11 = 1 nun i iii= = i =i immmi mi m i 1 = 111111111111= 1 

DAINIIEQLEKNGKVTRPALGIQMVNLSNVSTSDIRRLNIPSm^•SGVITOSVQS^MPANGHI J EKYDVITKVDDKEIASS 
290 300 310 320 330 340 350 

1416 1446 1476 1506 1536 1566 1596 1626 

SDLQSLLYGHQVGDSITVTFYRGENKQTVTIKLTKTSKDLAKQRANN*INSSYFN*DIVKLKGLVR*TNPFSKSIESEV* 
=1111 II I =11=1 =1=11 ==l =111 1=1 II 
TDLQSALYNHS IGDTI KITYYRNGKEETTS I KLNKSSGDLES 
370 380 390 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1649 

A DNA sequence (GBSxl744) was identified in S.agalactiae <SEQ ID 5105> which encodes the amino 
acid sequence <SEQ ID 5106>. This protein is predicted to be SPSpoJ (spoOJ). Analysis of this protein 
sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=D. 4152 (Affirmative) < succ; 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

= 5/257 (1%) 

Query: 1 MEYLETININHIAPNPYQPRLEFNTKELEELANSIKINGLIQPIIVRPSAVFGYELVAGE 60 

ME E I+I I NPYQPR EF+ ++L+ELA SIK NG+IQPIIVR S V GYE++AGE 
Sbjct: 1 MEKFEMISITDIQKNPYQPRKEFDREKLDELAQSIKENGVIQPIIVRQSPVIGYEILAGE 60 

Query: 61 RRLRAAKLAKLESIPAIIKSYNNDDSMQLAIVENLQRSNLSPIEEAKAYSQLLQKKSMTH 120 

RR RA+ LA L SIPA++K ++ + M +I+ENLQR NL+PIEEA+AY L++ K TH 
Sbjct: 61 RRYRASLLAGLRSIPAVVKQISDQEMKVQSI:EI'ILQRENLNPIEEARAYVSLVE-KGFTH 119 

Query: 121 EEIAKYMGKSRPYISNTIRLLNLPPLITSAIEEGKIjSSGHARALLSLPDASQQKDWYQRI 180 

E+A GKSRPYISN+IRLL+LP I S +E GKLS HAR+L+ L + QQ ++QRI 
Sbjct: 120 AEIADKEGKSRPY1SNSIRIjLSLPEQIIjSEVENGKI,SQAHARSLVGL-NKEQQDYFFQRI 178 
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Query: 181 LTEDISVRRLEKLLKQEKKTlfflKSLQNKDVFLKHQEKEIJiQFLGSKyKLTINKDGAGNIK 240 

+ EDISVR+LE LL ++K+ K Q + F++++E +L + LG V++ ++K +G I 
Sbjct: 179 IEEDISVRKLEALLTEKKQ KKQQKTNHFIQNEEKQLRKLLGLDVEIKLSKKDSGKII 235 

Query: 241 IAFANQEELNRI INTLK 257 

I+F+NQEE +RIIN+LK 
Sbjct: 236 IEFSNQEEYSRI INSLK 252 . 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5107> which encodes the amino acid 
sequence <SEQ ID 5108>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1758 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/256 (57%), Positives = 191/256 (74%), Gaps = 1/256 (0%) 

Query: 2 EYLETININHIAPNPYQPRLEFNTKELEELANS I KINGLIQPI IVRPSAVFGYELVAGER 61 

E L + I I NPYQPR++FN +EL++1A SIK NGLIQPIIVR S +FGYELVAGER 
Sbjct: 14 ELLIDLPIEDIVTNPYQPRIQFNQRELQDLATSIKSNGLIQPIIWKSDIFGYELVAGER 73 

Query: 62 RLPAAKLAKLESIPAIIKSYNNDDSMQLAIVENLQRSNLSPIEFAKAYSQLLQKKSMTHE 121 

RL+A+K+A L+ +PAIIK + +SMQ AIVENLQRSNL+ IEEAKAY L++KK MTH+ 
SbjCt: 74 RLKASKMAGLKKVPAIIKKISTLESMQQAIVENLQRSNLNAIEEAKAYQLLvEKiaiMTHD 133 



Query: 182 TEDISVFJJLEKLLKQEKKTNHKSLQNKDVFLKHQ3NELAQFLGSKVKLTINKDGAGNIKI 241 

E +SVR++E+L+ ++ S + K++F E +LA+ LG V + + + +G ++I 

Sbjct: 194 IffiGLSWQIEQLV-TSTPSSKLSKKTKNIFATSLEKQLAKSLGLSVNMJCLTANHSGYLQI 252 

Query: 242 AFANQEELNRI INTLK 257 

+F+N +ELNRIIN LK 
Sbjct: 253 SFSNDDELNRI 1NKLK 268 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1650 

A DNA sequence (GBSxl745) was identified in S.agalactiae <SEQ ID 5109> which encodes the amino 
acid sequence <SEQ ID 5110>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 2 - 18 ( 1-18) 



Final Results 

bacterial membrane — Certainty=0. 1150 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10297> which encodes amino acid sequence <SEQ ID 
10298> was also identified. 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 5111> which encodes the amino acid 
sequence <SEQ ID 5112>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3646 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 353/455 (77%) , Positives = 401/455 (87%) , Gaps = 6/455 (1%) 

Query: 32 MTENEQLFWNRVXjELSRSQIAPAAYEFFVLEARLLKIEHQTAVITLDNIEMKKLFWEQNL 91 

MTENEQ+ FWNRVLEL+ + SQ+ A YEFFV +ARLLK++ A I LD +MK+LFWE+NL 
Sbjct: 1 MTENEQIFWNRVIJELAQSQLKQATYEFFVHDARLLKVDKHIATIYLD--QMKELFWEKNL 58 

Query: 92 CSPVILTAGFEIFNAEITANYV-SNDLHLQETSFS-NyQQSSNEVMTLPIRKIDSNLKEKY 149 

VI LTAGFE + +NA+ 1 + +YV DL +++ N + +N+LP + S+L KY 

Sbjct: 59 KDVILTAGFEVYNAQISVDYVFEEDLMIEQNQTKINQKPKQQALiNSLET- -VTSDIjNSKY 116 

Query: 150 TFANFVQGDENRWAVSASIAVADSPGTTYNPLFIWGGPGLGKTHLimiGNQVLRDNPNA 209 

+F NF+QGDENRWAV+ASIAVA++PGTTYNPI,FIWGGPGLGKTHLLNAIGN VL +NPNA 
Sbjct: 117 SFENFIQGDENRWAVAASIAVANTPGTTYNPLFIWGGPGLGKTHLLNAIGNSVLLENPNA 176 

Query: 210 RVLYITAENFINEFVSHIRLDSMEELKEKFRNLiDLLLIDDIQSLAKKTLGGTQEEFFNTF 269 

R+ YITAENFINEFV HIRLD+M+ELKEKFRNLDLLIjIDDIQSIAKKTL GTQEEFFNTF ' 
Sbjct: 177 RIKYITAFJJFII^FVIHIRLDTMDELKEKFRNLDLLLIDDIQSLAKKTLSGTQEEFFNTF 23 6 

Query: 270 NALHTNDKQIVLTSDRNPNQLNDLEERLVTRFSWGLPVNITPPDFETRVAILTNKIQEYP 329 

NALH N+KQIVLTSDR P+ LNDLE+RLVTRF WGL VNITPPDFETRVAILTNKIQEY 
Sbjct: 237 NALHNNNKQIVLTSDRTPDHLNDLEDRLVTRFKWGLTVNITPPDFETRVAILTNKIQEnOT 296. 

Query: 330 YDFPQDTIEY^GEFDSNvRELEGALKMISLVADFKHAKTITVDIAAEAIRARKNDGPIV 389 

+ FPQDTIEYLAG+FDSNVR+LEGALK+rSIiVA+FK TITVDIAAEAIRARK DGP + 
Sbjct: 297 FIFPQDTIEYLAGQFDSNVRDLEGALKDISLVANFKQIDTITVDIAAEAIRARKQDGPKM 356 

Query: 390 TVIPIEEIQIQVGKFYGVTVKEIKATKRTQDIVLARQVAMYLAREMTDNSLPKIGKEFGG 449 

TVIPIEEIQ QVGKFYGVTVKEIKATKRTQ+IVLARQVAM+LAREMTDNSLPKIGKEFGG 
Sbjct: 357 TVIPIEEIQAQVGKFYGVTVKEIKATKRTQNIVLARQVAMFLAREMTDNSIjPKIGKEFGG 416 

Query: 450 RDHSTVLHAYNKIKNMVAQDDNLRIEIETIKNKIR 484 
RDHSTVLHAYNKIKNM++QD++LRIEIETIKNKI+ 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1651 

A DNA sequence (GBSxl746) was identified in S.agalactiae <SEQ ID 5113> which encodes the amino 
acid sequence <SEQ ID 5114>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certalnty=0 .0556 (Affirmative) ■ 

bacterial membrane Certainty=0. 0000 (Not Clear) < ! 

bacterial outside Certainty=0. 0000 (Not Clear) < : 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAC45337 GB:AF000658 beta subunit of DNA polymerase III 
[Streptococcus pneumoniae] 
Identities = 278/378 (73%) , Positives = 324/378 (85%) 

Query: 1 MIHFS INKNFFLHALTVTKRAISHKNAI PILSTVKI EVTRDAI I LTGSNGQI S IENT I PA SO 

MIHFSINKN FL AL +TKRAIS KNAI PILSTVKI +VT + + L GSNGQISIEN I 
Sbjct: 1 MIHFSINKNLFLQALNITKRAISSKNAIPILSTVKIDVTNEGVTLIGSNGQISIENFISQ 60 

Query: 61 SNENAGLLVTNPGSILLFAGFFINIISSLPDOTLEFTEIEQHQIVLTSGKSEITLKGKDV 120 

NE+AGLL+T+ GSILLEA FFIN++SSLPDVTL+F EIEQ+QIVLTSGKSEITLKGKD 
Sbjct: 61 KNEDAGLLITSLGSILIiEASFFINVVSSLPDVTIiDFKEIEQNQIVLTSGKSEITLKGKDS 120 

Query: 121 DQYPRLQEMTTDTPLTLETKLLKSIINETAFAASQQESRPILTGVHLVISQNKYFKAVAT 180 

+QYPR+QE++- TPL LETKLLK IINETAFAAS QESRPILTGVH V+SQ+K K VAT 
Sbjct: 121 EQYPRIQEISASTPLILETKLLKKIINETAFAASTQESRPILTGVHFVLSQHKELKTVAT 180 



Query: 241 YTRLLEGNYPDTDRLLTNQFETEIIFNimLRHAMERAYLISNATQNGTVRLEIQNETVS 300 

YTRLLEGNYPDTDRL+ F T I FN LR +MERA L+S+ATQNGTV+LEI++ VS 
Sbjct: 241 YTRLLEGNYPDTDRLIPTDFNTTITFMVVNLRQSMERARLLSSATQNGTVKLEIKDGVVS 300 

Query: 301 AHWSPEVGKVNEELDTVSLKGDSLNISFNPTYLIESLKAVKSETVTIRFISPVRPFTLT 360 

AHV+SPEVGKVNEE+DT + G+ L ISFNPTYLI+SLKA+ SE VTI FIS VRPFTL 
Sbjct: 301 AHTOSPEVGKVNEEIDTDQVTGEDLTISFNPTYLIDSLKALNSEKVTISFISAVRPFTLV 360 

Query: 361 PGEDTEDFIQLITPVRTN 378 

P + EDF+QLITPVRTN 
Sbjct: 351 PADTDEDFMQLITPVRTN 378 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5115> which encodes t 
sequence <SEQ ID 51 16>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 67 - 83 ( 67 - 83) 

Final Results 

bacterial membrane Certainty=0 .1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 295/378 (78%) , Positives = 334/378 (88%) 

Query: 1 MIHFS INKNFFLHALTVTKRAISHKNAI PI LSTVKIEVTRDAI ILTGSNGQIS IENTIPA 60 

MI FSIN+ F4HAL TKRAIS KNAIPILS++KIEVT + LTGSNGQISIENTIP 
Sbjct: 1 MIQFSINRTLFIHALNTTKRAISTKNAIPILSSIKIEVTSTGVTLTGSNGQISIENTIPV 60 

Query: 61 SNENAGLLVTNPGSILLEAGFFINIISSLPDVTLEFTEIEQHQIVLTSGKSEITLKGKDV 120 

SNENAGLL+T+PG+ILLEA FFINI ISSLPD+++ EIEQHQ+VLTSGKSEITLKGKDV 
Sbjct: 61 SNENAGLLITSPGAILLEASFFINIISSLPDISINVKEIEQHQWLTSGKSEITLKGKDV 1 120 

Query: 121 DQYPR!X3EMTTDTPLTLETKLLKSIIJffiTAFAASQQESRPILTGVHLVISQNKYFKAVAT 180 

DQYPRLQE++T+ PL L+TKLLKSII ETAFAAS QESRPILTGVH+V+S +K FKAVAT 
Sbjct: 121 DQYPRLQEVSTENPLILKTtCLLKSIIAETAFAASLQESRPILTGVHIVLSNHKDFKAVAT 180 

Query: 181 DSHRMSQRTFQLEKSANNFDLWPSKSLREFSAVFTDDIETVEVFFSDSQMLFRSENISF 240 

DSHRMSQR L+ ++ +FD+V+PSKSLREFSAVFTDDIETVEVFFS SQ+LFRSE+ISF 
Sbjct: 181 DSHRMSQRLITLDNTSADFDWIPSKSLREFSAVFTDDIETVEVFFSPSQILFRSEHISF 240 

Query: 241 YTRLLEGNYPDTDRLLTNQFETEIIFlSrrNALRHAMER&YLISNATQNGTVRLEIQNETVS 300 

YTRLLEGNYPDTDRLL +FETE++FHT +LRHAMERA+LISNATQNGTV+LEI +S 
Sbjct: 241 YTRLLEGNYPDTDRLLMTEFETEVVFTSITQSLRHAMERAFLISNATQNGTVKLEITQNHIS 300 
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Query: 301 AHVNSPEVGKOTSIEELDWSLKGDSIiNISFNPTYLIESLKAVKSETVTIRFISPVRPFTLT 360 

AHVNSPEVGKVNE+LD VS G •£, ISFNPTYLIESLKA+KSETV I- F+SPVRPFTLT 
Sbjct: 301 AHVNSPEVGKVNEDLDIVSQSGSDLTISF1JPTYLIESLKAIKSETVKIHFLSPVRPFTLT 360 

5 

Query: 361 PGEDTEDFIQLITPVRTN 378 

PG++ E FIQLITPWTN 
Sbjct: 361 PGDEEESFIQIiITPVRTN 378 

1 0 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1652 

A DNA sequence (GBSxl747) was identified in S.agalactiae <SEQ ID 5117> which encodes the amino 
acid sequence <SEQ ID 5118>. Analysis of this protein sequence reveals the following: 

15 Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0857 (Affirmative) < suco 

20 bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10299> which encodes amino acid sequence <SEQ ID 
1030O was also identified. 
25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00282 GB:AF008220 YtlR [Bacillus subtilis] 
Identities = 83/298 (27%) , Positives = 138/298 (45%) , Gaps = 35/298 (11%) 

Query: 19 YIIANPHAGNKNASTIVGKIQE- -LYHTEDISVFYTEQKDDEK- -KQVINILRSFKESDH 74 
30 + I NP AG++N + IQ+ + + F TE + + 1+ ++ +K 

Sbjct: 5 FFIINPTAGE1RNGLRVWKSIQKELIKRKVEHRSFLTEHPGHAEVLARQISTIQEYKLK-R 63 

Query: 75 LMIIGGDGTLSKVMTYLPQ--HIPCTYYPVGSGNDFARALKIPNL KETLTA 123 

L++IGGDGT+ +V+ L I ++ P G+ NDF+R I + K LT 

35 Sbjct: 64 LIVIGGDGTMHEVWGLKDVDDIELSFVPAGAYNDFSRGFSIKKIDLIQEIKKVKRPLT- 122 

Query: 124 IQTERLKEINCFIYDKGLIIj NSLDLGFAAYWWKASNSKIKNILNRYRLGKITYIVI 180 

+T L +N F+ DK XL N + +GF AYV KA ++ + RL + Y + 

Sbjct: 123 -RTFHLGSVN-FLQDKSQIIjYFMNHIGIGFDAYVNKKAMEFPLRRVFLFIiRLRFLVYPL- 179 

40 

Query: 181 AIKSLLHSSK VQVLVEGETGQQIiCLNDLYFFALANNTYFGGGITIWPKASALTA 234 

■ S LH+S + E ET + +D++F ++N+ ++GGG+ P A+ 

Sbjct: 180 SHLHASATFKPFTLACTTEDETRE FHDVWFAWSNHPFYGGGMKAAPLANPREK 233 

45 Query: 235 ELDMVYAKGHTFLKRLSILLSLVFKRHTTSKSIKHQTFKAMTVYFPKNSLIEIDGEIV 292 ' 

D+V + FLK+ +L + F +HT + K +T Y DGEI+ 

Sbjct: 234 TFDIVIVENQPFLKKYWLLCLMAFGKHTKMDGVTMFKAKDITFYTKDKIPFHADGEIM 291 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1653 

A DNA sequence (GBSxl748) was identified in S.agalactiae <SEQ ID 5121> which encodes the amino 
acid sequence <SEQ ID 5122>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
55 >>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3792 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Mot Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45338 GB:AF000658 ORFX [Streptococcus pneumoniae] 
Identities = 46/63 (73%) , Positives = 57/63 (90%) 

10 

Query: 1 ^QVGSIiVEMKKPHACVIKETGKKANQWKVLRVGADIKIQCTNCQHVIMMSRYDFERKLK 60 

MYQVG+ VEMKKPHAC IK TGKKAN+W++ RVGADIKI+C+NC+HV+MM RYDFERK+ 
SbjCt: 1 ^QVGNFVEMKKPHACTIKSTGKKANRWEITRVGMDIKIKCSNCEHVVMGRYDFERKMN 60 

15 Query: 61 KVL 63 

K++ 

Sbjct: 61 KII 63 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5123> which encodes the amino acid 
20 sequence <SEQ ID 5124>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0. 4 03 B (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 63/65 (96%) , Positives = 64/65 (97%) 

Query: 1 MYQVGSLVEMKKPHACVI KETGKKANQWKVLRVGAD I KI QCTNCQHVIMMSRYDFERKLK 60 

MYQ+GS VEMKKPHACVIKETGKKMIQWKVLRVGADIKIQCTNCQHVimSRYDFERFXK 
Sbjct: 1 I^QIGSFVEMKKPHAOTItCETGKKANQWK^RVGADIKIQCTNCQHVIMMSRYDFERKLK 6 0 

35 

Query: 61 KVLQP 65 

KVLQP 
Sbjct: 61 KVLQP 65 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1654 

A DNA sequence (GBSxl749) was identified in S.agalactiae <SEQ ID 5125> which encodes the amino 
acid sequence <SEQ ID 5126>. Analysis of this protein sequence reveals the following: 

45 Possible site: 15 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.99 Transmembrane 48 - 64 ( 47 - 66) 

Final Results 

50 bacterial membrane Certainty=0. 2996 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
55 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1655 

A DNA sequence (GBSxl750) was identified in S.agalactiae <SEQ ID 5127> which encodes the amino 
5 acid sequence <SEQ ID 5128>. Analysis of this protein sequence reveals the following: 
Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 4171 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1656 

A DNA sequence (GBSxl751) was identified in S.agalactiae <SEQ ID 5129> which encodes the amino 
20 acid sequence <SEQ ID 5130>. This protein is predicted to be GTP-binding protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3952 (Affirmative) < suco 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 8875> which encodes amino acid sequence <SEQ ID 8876> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
McG: Discrim Score: 0.53 
GvH: Signal Score (-7.5): -0.13 
35 Possible site: 29 

>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 1.48 threshold: 0.0 
PERIPHERAL Likelihood = 1.48 195 
modified ALOM score: -0.80 

40 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07770 GB.-AP001520 GTP-binding protein [Bacillus halodurans] 
50 Identities = 223/329 (67%) , Positives = 273/329 (82%) , Gaps = 5/329 (1%) 

Query: 1 ^1VEVPDERLQKLTELITPKKTVPTOFEFTDIAGIVKGASKGEGLGNKFLANIREVI)AIVH 50 
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Sbjct: 


43 


Query: 


61 
103 


Sbjct: 
Query: 


121 


Sbjct: 


158 


Query: 


181 


Sbjct: 


218 


Query: 


241 


Sbjct: 


278 


Query: 


301 


Sbjct: 


338 



-1860- 

+VEVPD RLQKLTEL+ PKKTVPT FEFTDIAGIV+GASKGEGLGN+FL++IR+VDAI H . 



+VAE VL+K+K E+ K AR+IEFTEE+ K+VKGL LLT+KPVLYVANV ED V 



7 +++AFA EN+EV+V+SA+ E3EI+ELD E+K FLE +G+ ESG+D+L RAAY 



LLGL TYFTAGE+EVRAWTF++G KAPQAA I IHSDFE+GFIRA T+SY+DL++ GS 



KE G++R EGKEY+VQDGD++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5131> which encodes the amino acid 
sequence <SEQ ID 5132>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
»> Seems to have a cleavable N-term signal seq. 



30 . Final Results 

bacterial outside — Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

?GP:BAB07770 GB:AP001520 GTP-binding protein [Bacillus halodurans] 
Identities = 259/371 (69%), Positives = 314/371 (83%), Gaps = 5/371 (1%) 

MALTAGIVGLPNVGKSTLFNAITKAGAEAANYPFATIDPNVGMVEVPDERLQKLTEL1TP 6 0 
MALT GIVGLPNVGKSTLFNAIT+AGAE+ANYPF TIDPNVG+VEVPD RBQKLTEL+ P 
MALTTGIVGLPNVGKSTLFNAITQAGAESPJSYPFCTIDPNVGIVEVPDPRLQKLTELVNP 6 0 



KKTVPT FEFTDIAGIV+GAS+G3GL3N+ FL+ + IR+ +DAI HWR FDDEN+ 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbj ct: 


119 


Query: 


181 


Sbjct: 


176 


Query: 


241 


Sbjct: 


236 


Query: 


301 


Sbjct: 


296 




361 


Sbjct: 


356 



VDPI DI INLELILADLES++KR++RV+K+A+T KDKE+VAE VL+K+K 



K AR+IEFTE++ K+VKGL LLT+KPVLYVANV ED V +PD +V++++ FAA EN+E 



V+V+SA+ EEEI+ELD E+K FLE +G+ ESG+D+L RAAY LLGL TYFTAGE+EVRA 



WTF++G KAPQAAGI IHSDFE+GFIRA T+SY+DL+ GS KE G++R EGKEYWQ 



DGD++ FRFNV 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 316/329 (95%) , Positives = 322/329 (97%) 



MVEVPDERLQKLTELITPKKTVPTTFEFTDIAGIVKGAS+GEGLGNKFLANIRE+DAIVH 



WRAFDDENVMREQGREDAFVDPI ADI DT INLEL ~ LADLE S INKRYARVEKMARTQKDKE 



SVAEF1^QKIKPVLEDGKSARTIEFTE+EAKVTOGLFLLTTKPVLY7ANVDEDKVA+PD 



Query: 


1 


Sbjct: 


43 


Query: 


61 


Sbjct: 


103 


Query: 


121 


Sbjct: 


163 




181 


Sbjct: 


223 


Query: 


241 


Sbjct: 


283 


Query: 


301 


Sbjct: 


343 



IDYV QIR FA TENAEVWISARAEEEISELDDEDK EFLEAIGLTESGVDKLTRAAY 



SEQ ID 8876 (GBS 177) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
30 extract is shown in Figure 38 (lane 4; MW 41.2kDa). 

The GBS177-His fusion product was purified (Figure 118A; see also Figure 202, lane 7) and used to 
immunise mice (lane 1 product; 20fig/mouse). The resulting antiserum was used for Western blot, FACS, 
and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1657 

A DNA sequence (GBSxl752) was identified in S.agalactiae <SEQ ID 5133> which encodes the amino 
acid sequence <SEQ ID 5134>. This protein is predicted to be stage V sporulation protein C (pth). Analysis 
of this protein sequence reveals the following: 

3 N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2212 (Affirmative) < suco> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10301> which encodes amino acid sequence <SEQ ID 
10302> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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6 










Sbjct: 


60 


Query: 


126 


Sbjct: 


120 


Query: 


186 


Sbjct: 





-1862- 

! = 89/187 (47%) , Positives = 127/187 (57%) , Gaps = 2/187 (1%) 

VKMIVGLGNPGSKMCITKHlJIGFMA.VDRIVKDLDVNFTEDKNFKaEIGSDFINGEKI YFI 6 5 
+K+IVGLGNPG+KY+ T+HN+GF VD + + L++ + K G I+GEKI+ + 

MKLIVGLGNPGAKYDGTiRHNVGFDWDAVARRLNIEIKQSKA-NGLYGEGRIDGEKlFLL 59 



HLGT +F RI+VG+ R 



EVMNTFN 185 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5135> which encodes the amino acid 
sequence <SEQ ID 5136>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2840 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 148/189 (78%), Positives = 166/189 (87%) 

Query: 5 MVT<MIVGLGNPGSKYNDTKHNIGFMAVDRIVKDLDVNFIEDKNFKAEIGSDFINGEKIYF 64 

MVKMIVGLGNPGSKY TKHNIGFMA+D IVK+LDV FT+DKNFKA+IGS FIN EK+YF 
Sbjct: 16 MVKMIVGLGNPGSKYEKTICHNIGFMAIDNIVICNLDVTFTDDKNFICAQIGSTFINHEKVYF 75 

Query. 65 IKPTTFMNNSGIAVKALLTYYNISIKDMIIIYDDLDMEVGKIRFRQKGSAGGHNGIKSII 124 

+KPTTFMNNSGIAVKALLTYYNI I D+I+IYDDLDMEV K+R R KGSAGGHNGIKSII 
Sbjct: 76 VKPTTFMNNSGIAVKALLTYYNIDITDLIVIYDDLDMEVSKLRLRSKGSAGGHNGIKSII 135 

Query: 125 AHLGTQEFDRIECVGIGRPNGRMWIl^JHVLGKFDKNDEIMILNTLDKVDNAVNYYLQTIIDF 184 

AH+GTQEF+RIKVGIGRP MTVINHV+G+F+ D I I TLD+V NAV +YLQ NDF 
Sbjct: 13 5 AHIGTQEFNRIKVGIGRPLKGMTVINHVMGQFNTEDNIAISLTLDRVVNAVKFYLQENDF 195 

Query: 185 QKTMQKYNG 193 

+KTMQK+NG 
Sbjct: 195 EKTMQKFNG 204 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1658 

A DNA sequence (GBSxl753) was identified in S.agalactiae <SEQ ID 5137> which encodes the amino 
acid sequence <SEQ ID 5138>. This protein is predicted to be transcription-repair coupling factor (mfd). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 .2456 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD03810 GB:AF054624 transcription-repair coupling factor 
[Lactobacillus sakei] 
Identities = 523/1051 (49%) , Positives = 733/1051 (68%) , Gaps = 20/1051 (1%) 

MNIIELFSQNKVWTWHSGLVTNSRQLVMC-FSGASKAIAIASAYEKLSKKIMWTATQTD 6 0 
M++I + + V++ RQIri- G SG++K + +A+ Y++ + ++++ + 

MDLISMLGNTQQVQSVLENQKPGVRQLLTC-LSGSAKTIjFLATiyKQQRQPLLIIESNMFQ 6 0 

SDKLSSDISSLIGEDNVYQFFADDVPAAEFIFSSLDKSISRLSALRFLKDPEKNGVLITS 120 
+++++ D+++ + D +Y F ++V AAE SS + R+ L FL +K G+++TS 
ANQVAEDLANQLNGDQIYTFPVEEVMAAEIAVSSPESRAERVRTLSFLATGKK-GIVVTS 119 



KLEG-YLVTASEVQ RTYLSEVLSTTENHFKHSDIRRFLSIFYEKEVJGI 2B7 

L+ Y TA+++ T 4S +L+ 4 ++ F+ Y + 

ALQADYQQTAAKITAKDDQICALAVNFETPISRLLAGE RLENLALFVDYLYPDHTSL 295 



+DY + DD+ +1 + L E A+ T+ L + + D + ++Q Q 

IDYFKNSGLWFDDYPRIQETQRVLAEFjyaWJQTDMLGSRRLLPAQKLLVDVHHLMKQDQ 355 

-PATFFSNFHKGLGNL.KFDKLHHPTQYGKQEFFNQFPLLVEEINRYKKSGATVLLQVDSQ 406 























Sb^ct: 
































































Sbjct: 






647 


Sbjct: 


656 




707 


Sbjct: 


716 


Query: 


757 


Sbjct: 






827 


Sbjct: 


836 



h ++ ++ K +V + Q++ G L NGF D K+V++TE+E+++ 



K+K+RR ++NAERLK Y+EL GDYWH HG+G+++G+ET+E+ G+H+DY+TI Y++ 



h IPV Q++++ KYVSA+ K PKIN L ++K K +V+ ++EDIADDL++LYA+R 



+G+AF DD +Q DF+N FAY ET+DQLRS 31 K DME RPMDRLLVGDVGFGKT 



EVA+RAAFKAV KQV LVPTT+LAQQH+EN RF+++PV + +LSRF+++KE T 1 



LK L KGQVDI+IGTHRLLS+DWF DLGIi+-f +DEEQRFGVKHKE+LK+LK 



ATP I PRTLHMSMLG+RDLSVIETPPTNRYP+QTYV+E N G +REAI RE++R GQVFY+ 
ATPIPRTLHMSMLGVRDLSVIETPPTNRYPIQTYVMEQNAGAMREAIERELERNGQVFYL 835 



+N+V I+Q V E+Q LVPEA++G+ KGQM-E QLE + DF+ G YDVLV TTIIETGV 
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Query: 887 DISNVNTLFVENADHMGLSTLYQLRGRVGRSKRIAYAYLMYREDKVLTEISEKRLDAIKG 946, 

D+ NVNT+ VE+ADH GLS LYQLRGR+GRS+R+AY Y MY+PDKVLTE+SEKRL AIK 
Sbjct: 896 DMPNVNTMIVEDADHYGIjSQLYQLRGRIGRSSRVAYGYFMYKPDKVLTEVSEKRLQAIKD 955 

Query: 947 FTELGSGFKIAMRDLSIRGAGNILGASQSGFIDSVGFEMYSQLLEQAIATKQGKSLIRQK 1006 

FTELGSGFKIAMRDLSIRGAGN+LG Q GFIDSVGF++YSQ+L +A+A KQGK + K 
Sbjct: 956 FTELGSGFKIAMRDLSIRGAGNLLGKQQHGFIDSVGFDLYSQMLSEAVAKKQGKK-VAAK 1014 

Query: 1007 GNAELALQI DAYLPAEYI SDERQKI E I YKRI 1037 

NAE+ L+++AYLP +YI+D+RQKIEIYKRI 
Sbjct: 1015 TNAEIDLKLEAYLPDDYINDQRQKIEIYKRI 1045 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5139> which encodes the amino acid 
sequence <SEQ ID 5140>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2826 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 875/1161 (75%) , Positives = 1032/1161 (86%) 

^IIELFSQNKVWTWHSGLVTNSRQLVMGFSGASKAIAIASAYEKLSKKIMVVTATQTD 60 
M+I+ELFSQNK V++WHSGL T RQLVMG SG+SK +AIASAY KKI+WT+TQ + 
MDILELFSQNKKVQSVfflSGLTTLGRQLvMGLSGSSKTIAIASAYLDDQKKIvVVTSTQNE 60 



Query: 




Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 




301 


Sbjct: 


301 






Sbjct: 


361 




421 


Sbjct: 


421 




481 


Sbjct: 


481 






Sbjct: 





+SGLR+LLPNP+VF+KSQ + +G++ D h K L+ +GYQKVSQV SPGEFS+RGDIL 



DI+E+TQE PYRLEFFGD+ID IRQF +TQKS +QLE + I+PA D+I + +DF+R 



+LE L TA + +++YL +VL+ ++N FKH DZR+F S+FYEKEW +LDYIP+GTP+F D 



NLKFDKLHHFTQYGMQEFFNQFPLLVDEINRYKKSGATVLLQVDSQKGLNLLQENLKEYG 420 
TQY MQEFFNQFPLL+DE1 RY+K+ TV++QV+SQ L+++ ++Y 



RLKDYNEL+VGDYWHNVHG+G+FLGIETI + IQGIHRDY+TIQYQN+DRIS+P++QI L 
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Query: 601 QNDFDNDFAYVETEDQLRSIKSIKQDMEGNRPMDRLLVGDVGFGKTEVAMEJAAFKAVKDH 660 

Q FD+DFA+VETEDQLRS I KEIK DME +PM)RIiLVGDVGFGKTEVAMRAAFKAVHDH 
Sbjct: 601 QRAFDDDFAFVETEDQLRSIKEIKADMESMQPMDRLLVC3DVGFGKTEVAMRAAFKAVNDH 660 

Query: 661 KQ\ArVLVPTTVLAQQHFENFKERFSNYP-VTVDVLSRFRSKKEQTDTLKRLSKGQVDIIIG 720 

KQV VLVPTTVLAQQH+ENFK RF NYPV VDVLSRFRSKKEQ +TL+R+ KGQ+DIIIG 
Sbjct: 661 KQVAVLVPTTVIAQQHYENFKARFENYPVEVDVLSRFRSKKEQAETLERVRKGQIDIIIG 720 

Query: 721 THRLLSQDWFSDLGLIVIDEEQRFGVKHKEKLKELKTKVDVLTLTATPIPRTLHMSMLG 780 

THRLLS+DWFSDLGLIVIDEEQRFGVKHKE LKELKTKVD VLTLTAT P I PRTLHMSMLG 
Sbjct: 721 THRLLSKDWFSDLGLIVIDEEQRFGVKHKETLKELKTBCVDVLTLTATPIPRTLHMSMLG 780 

Query: 781 IRDLSVIETPPTNRYPVQTYVLETNPGLWEAIIREIDRGGQVFYVYNKVDTIDQKVSEL 840 

IRDLSVIETPPTNRYPVQTYVLE NPGLWEAIIRE+DRGGQ+FYVYNKVDTI++KV+EL 
Sbjct: 781 1RDLSVIETPPTNRYPVQTYVLENNPGLVREAIIREMDRGGQIFYVYNKVDTIEKKVAEL 840 

Query: 841 QELVPEAS IGFVHGQMSEIQLENTLIDFINGDYDVLVATTI IETGVDI SNVNTLFVENAD 900 

QELVPEASIGFVHGQMSEIQLENTLIDFINGDYDVLVATTIIETGVDISNVNTLF+ENAD 
Sbjct: 841 QELVPEASIGFVHGQMSEIQLENTLIDFINGDYDVLVATTIIETGVDISNVNTLFIENAD 900 

Query: 901 HMGLSTLYQLRGRVGRSNRIAYAYLMYRPDIO^TEISEKRLDAIKGFTELGSGFKIAMRD 960 

HMGLSTLYQLRGRVGRSNRIAYAYLMYRPDKVLTE+SEKRL+AIKGFTELGSGFKIAMRD 
Sbjct: 901 HMGLSTLYQLRGRVGRSNRIAYAYLMYRPDKVLTEVSEKRLEAIKGFTELGSGFKIAMRD 960 

Query: 961 LSIRGAGNILGASQSGFIDSVGFEMYSQLLEQAIATKQGKSLIRQKGNAELALQIDAYLP 1020 

LSIRGAGNILGASQSGFIDSVGFEMYSQLLEQAIA+KQGK+ +RQKGN E+ LQIDAYLP 
Sbjct: 961 LSIRGAGMILGASQSGFIDSVGFEMYSQLLEQAIASKQGKTTVRQKGNTEINLQIDAYLP 1020 

Query: 1021 AEYISDERQKIEIYKRIRELETRADYEALQDEIjIDRFGEYPDQVAYLLEIGLLKAYLDLA 1080 

4YI+DERQKI+IYKRIRE+++R DY LQDEL+DRFGEYPDQVAYLLEI LLK Y+D A 
Sbjct: 1021 DDYIADERQKIDIYKRIREIQSREDYLNLQDELMDRFGEYPDQVAYLLEIALLKHYMDNA 1080 

Query: 1081 FTELVERKGNEISILFEKASLKYFLTQDYFFJ^SKTQLKARISETNGKMEWFNIKHKKN 1140 
F ELVERK N++ + FE SL YFLTQDYFEALSKT LKR+ISE GK+++VF+++H+K+ 

Query: 1141 YEIIEELLKFAECFIEIKSRK 1161 

Y I+EEL+ F E EIK RK 
Sbjct: 1141 YRILEELMLFGERLSEIK1RK 1161 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1659 

A DNA sequence (GBSxl754) was identified in S.agalactiae <SEQ ID 5141> which encodes the amino 
acid sequence <SEQ ID 5142>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4347 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11835 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 50/84 (59%) , Positives = 70/84 (82%) 

Query: 1 mLDKYLKVSRIIKRRPVAKEVADKGRVKVNGVLAKSSTDLKLNDQTO 60 

MRLDK+LKVSR+IKRR +AKEVAD+GR+ +NG AK+S+D+K D++ +RFG KL+TV+V 
Sbjct: 1 mLDKFLKVSRLIKRRTIAXEVAI^RISINGNOAKASSDVKPGDELTTOFGQKLVTVQV 60 

Query: 61 LEMKDSTKKEDAIKMYEI INETRI 84 
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E+KD+TKKE+A MY 1+ E ++ 
Sbjct: 61 NELKDTTKKEEAANMYTILKEEKL 84 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5143> which encodes the amino acid 
5 sequence <SEQ ID 5144>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2963 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 72/90 (80%), Positives = 85/90 (94%) 

Query: 1 MRLDKYLKVSRIIKRRPVAKEVADKGRVKVKGVIAKSSTDLK^ 60 

MRLDKYLKVSR+IKRR VAKEVADKGR+KVNG+LAKSST++KLND +EI FGNKLLTV+V 
Sbjct: 9 MRLDKYLKVSRLIKRRSVAKEVADKGRIKVKGIIAKSSTNIKIjNDHIEISFGNKLLTWV 68 

20 

Query: 61 LEMKDSTKKEDAIKMYEIINETRIETDEQA 90 

+E+KDSTKKEDA+KMYEII+ETRI +E+A 
Sbjct: 69 I E I KDSTKKEDALKMYE 1 1 SETRITLNEEA 98 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1660 

A DNA sequence (GBSxl755) was identified in S.agalactiae <SEQ ID 5145> which encodes the amino 
acid sequence <SEQ ID 5146>. This protein is predicted to be DivIC homolog. Analysis of this protein 
30 sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.12 Transmembrane 34 - 50 ( 31 - 56) 

35 Final Results 

bacterial membrane Certainty=0 . 4248 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98903 GB:AF023181 DivIC homolog [Listeria monocytogenes] 
Identities = 36/119 (30%) , Positives = 65/119 (54%) , Gaps = 2/119 (1%) 

Query: 2 SKPNWQl^QYINDE-NLKKRYEAEELRRKNRLMGWVLIFVMLLFILPTYNLVKSYRTL 60 
45 +K V ++ N+YI D +KK + RL +IF ++ +L T K TL 

Sbjct: 4 AKSKVARIENRYIKDTATMKKTRSRRRIALFRRIAFMAIIFAVVGGLL-TITYTKQVLTL 62 

Query: 61 QERRQEVVKLTIu^YQTLTNRTENQKLIjAKQLKNPDYVQKYARAKYYFSKTGEMIYPLPD 119 
+E++++ V++ K + + ++ K+L N DY+ K AR++YY SK GE+I+ +P+ 

50 Sbjct: 63 KEKraKQVQVDKKMVAMKDEQDSLNEQIIGM 121 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5147> which encodes the amino acid 
sequence <SEQ ID 5148>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

55 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -3.93 Transmembrane 34 - 50 ( 32 - 51) 

Final Results 

bacterial membrane Certainty=0 .2572 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 

+ ++ V++ K+ + + + ++L +D+++ K AR++YYLS++GE+I+ IP 

Sbjct: 65 KKEKQVQVDKKMVAMKDEQDSLNEQ I KKLHNDDY IAKLARSEYYLSKDGE 1 1 FNI P 120 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 73/123 (59%) , Positives = 99/123 (80%) 
Sb j ct : 
Sbjct 
Sbjct 



MSKPNWQLNNQYINDENLKKRYEAEELRRKNRLMGWVLI FVMLLFILPTYNLVKSYRTL 6 0 
M KP++VQLNN YI ENLKK++E EE +++NR MGW+L+ +M LFILPTYNLVKSY 
MKKPSIVQLNNHYIKKENLKKKFEEEESQKRNRFMGWILVSMMFLFILPTYNLVKSYVDF 6 0 



121 LPK 123 

LPK 
121 LPK 123 



SEQ ID 5146 (GBS418) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 172 (lane 3; MW 42kDa). 

GBS418-GST was purified as shown in Figure 219, lane 4-5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1661 

A DNA sequence (GBSxl756) was identified in S.agalactiae <SEQ ID 5149> which encodes the amino 
acid sequence <SEQ ID 5150>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4355 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1662 

A DNA sequence (GBSxl757) was identified in S.agalactiae <SEQ ID 5151> which encodes the amino 
acid sequence <SEQ ID 5152>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.52 Transmembrane 4 - 20 ( 3-22) 

Final Results 

bacterial membrane --- Certainty=0. 3208 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5153> which encodes the amino acid 

sequence <SEQ ID 5154>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=o .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 205/428 (47%) , Positives = 285/428 (65%) 

Query: 1 MKKVLTFLLCSLYWSIPAISTEEPLTLSQNP^YALTQTVVDKEMYPDAIPERPTTKIEI 60 

M+K+L +L + + +P ISTE+ L S+N Y L Q W +++ IP P E 
Sbjct: 1 MRKLLAAMLMTFFLTPLPVISTEKKLIFSKNAVYQLKQDWQSTQFYNQIPSNPNLYQET 60 

Query: 61 SSFQDEALTITGETLVPNTLLSIVSLTINSNGIPVFTLSNGQFIKASREAIFNDLVSKQQ 120 

+++D LT+ L N L I SL +N +PVF L++G +++A+R+ I++D+V Q 
Sbjct: 61 CAYKDSDLTLPAGRLGWQPLLIKSLVLNKESLPVFELADGTYVEANRQLIYDDIVLNQV 120 

Query: 121 SVSLDYWLKPSFVTYEAPYTNGVSEVI<NNLKPYSRVHLVEQAETEHGIYYKTDSGFWISV 180 

+ +W + Y APY G + ++ +VH + A+T HG YY D W S 

Sbjct: 121 DIDSYFWTQKKLRLYSAPYVLGTQTIPSSFLFAQKVHATQMAQTNHGTYYLIDDKGWASQ 180 

Query: 181 EDLSVADNRMAKVQEVLLEKYNKDKXGIYIKQLNTQTVAGINIDRSMYSASIAKLATLYA 240 

EDL DNRM KVQE+LL+KYN Y I++KQLNTQT AGIN D+ MY+ASI+KLA LY 
Sbjct: 181 EDLVQFDNRMLKVQEMLLQKYNNPNYSIWKQLNTQTSAGINADKKI'IYAASISKLAPLYI 240 

Query: 241 SQEQVKLGKLSLDSKFEYfOJI^/NQFPNSYDPSGSGKLEKKADHKLYTVKELLEATAKESD 300 

Q+Q++ KL+ + Y +VN F YDP GSGK+ K AD+K Y V++LL+A A++SD 
Sbjct: 241 VQKQLQKKKLAENKTLTYTKDVNHFYGDYDPLGSGKISKIADNKDYRVEDLLKAVAQQSD 300 

Query: 301 NVATNMLGYYvNNQYDSMFQTQVBTISGMHVTO^ 3 SO 

NVATN+LGYY+- +QYD F++++ +SG+ WDM++R ++ ++A MMEAIY+Q G I++Y 
Sbjct: 301 NVATNILGYYLCHQYDKAFRSEIKALSGIDWDMEQRLLTSRSAANMMEAIYHQKGQIISY 360 

Query: 361 LSKTDFDNTRIPKNIPVKVAHKIGDAYDYKHDAAIVYAEQPFIMIIFTDKSSYDDITKIA 420 

LS T+FD RI KNI V VAHKIGDAYDYKHD AIVY PFI+ IFT+KS+Y+DIT IA 
Sbjct: 361 LSNTEFDQQRITKNITVPVAHKIGDAYDYKHDVAIVYGNTPFILSIFTNKSTYEDITAIA 420 

Query: 421 DDVYQVLK 428 

DDVY +LK 
Sbjct: 421 DDVYGILK 428 
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SEQ ID 5152 (GBS116) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 3; MW 48.5kDa). The GBS116-His fusion product was purified (Figure 
202, lane 6) and used to immunise mice. The resulting antiserum was used for FACS (Figure 316), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1663 

A DNA sequence (GBSxl758) was identified in S.agalactiae <SEQ ID 5155> which encodes the amino 
acid sequence <SEQ ID 5156>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2260 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Hot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35664 GB:AE001733 conserved hypothetical protein [Thermotoga maritima] 
Identities = 100/404 (24%) , Positives = 181/404 (44%) , Gaps = 61/404 (15%) 

QKVL1AVSGGIDSINLLQFLYQYQKELSISIGIAHINHGQRKESEKEEEYIRQWGQIHDV 78 
+ VL+AVSGG1DS+ LL L ++ L I I AH++H R+ S ++ E++ + + ++ 
EHVLVAVSGGIDSMTLLYVLRKFSPLLKIKITAAHLDHRIRESSRRDREFVERICRQWNI 65 

PVFISYF -QGIFSEDRARNHRYNFFSKVMREEGYTALVTAHHADDQAETVFMR 130 

PV S G E+ AR RY+F + ++ G + + AHH +D ETV R 

PVETSEVDVPSLim3SGKTLEEIARF^YDFLKRTA[<^GASKIALAHHI<NDLLETVVHR 125 

ILRGSRLRYLSGIKQVSAFANGQLIRPFLPYKKELLP NIFHFEDASNASSDYLR 184 

++RG+ L+ I + IRPFL +K+ + N+ + D +N + Y R 

LIRGTGPLGLACISP KREEFIRPFLVFKRSEIEEYARKNNVPYWDETNYNVKYTR 181 



Query: 


19 


Sb j ct : 


6 




79 


Sbjct: 


66 


Query: 


131 


Sbjct: 


126 




185 


Sbjct: 


182 




236 


Sbjct: 






294 


Sbjct: 


292 






Sbjct: 


335 



N ++D++ L T I) 



CTDSFKVEKRIiELHNIQIFSQYLFSYGKFISQADITIPIYDT SPIILRR 350 

FK + R+E+ G + I + + +R 

--TVFKKKYRVEVK GD^E^GFKIRVVNNRNDMKFWVRN 334 



50 A related DNA sequence was identified in S.pyogenes <SEQ ID 5157> which encodes the amino acid 
sequence <SEQ ID 5158>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2187 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An. alignment of the GAS and GBS proteins is shown below. 

Identities = 218/424 (51%) , Positives = 290/424 (67%) , Gaps = 2/424 (0%) 

Query: 2 YNTILKDTLSKGLFTAHQKVLIAVSGGIDSINLLQFLYQYQKELSISIGIAHIHHGQRKE 61 

Y I + +K F H+ VLIAVSGG+DS+NLL FLY +Q +L I IGIAH+NH QR E 
Sbjct: 4 YQEIFNEIKMI^YFKIfflRHvlIAVSGGVDSMNLLHFLYLFQDKLKIRIGIAHVNHKQRSE 63 

Query: 62 SEKEEEYIRQWGQIHDVPVFISYFQGIFSEDRARNHRYMFFSKVMREEGYTALVTAHHAD 121 

S+ EE Y++ W + HD+P+++S F+GIFSE AR+ RY FF +M + Y+ALVTAHH+D 
Sbjct: 64 SDSEEAYLKCWAKKHDIPIYVSNFEGIFSEKAARDWRYAFFKSIMLKNNYSALVTAHHSD 123 

Query: 122 DQAETVFMRILRGSRLRYLSGIKQVSAFANGQLIRPFLPYKKELLPNIFHFEDASNASSD 181 

DQAET+ MR++RGSRLR+LSGIK V FANGQLIR.PFL + K+ LP IFHFED+SN 
Sbjct: 124 DQAETILMRLIRGSRLRHLSGIKSVQPFANGQLIRPFLTFSKKDLPEIFHFEDSSNRELS 183 

Query: 182 YLRNRIRNVYFPALERENNQLKDSLITLSEETECLFTALTDLTRSIEVTNCYDFLRQTHS 241 

+LRNR+RN Y P L++EN + L L+ E LF A +LT I T+ +F Q+ S 
Sbjct: 184 FLP^VRNNYLPLLKQENPRFIQGLNQLALENSLLFQAFKELTNHITTTDLTEFNEQSKS 243 

Query: 242 VQEFLLQDYISKFPDLQVSKEQFRVILKLIRTKANIDYTIKSGYFLHKDYESFHITKIHP 301 

+Q FLLQDY+ FPDL 4 K QF +L++I4T Y +K Y++ D SF ITKI P 

Sbjct: 244 IQYFLLQDYLEGFPDLDLKKSQFTQLLQI IQTAKQGYYYLKKDYYI FIDKFSFKITKI VP 303 



Sbjct 



3 60 GNHTKKIRRLFIDEKITLKEREEAVIGEQNKEI ,1 FVIVAGRTYLRKPSEHDIMKGKLYIE 419 

G+ +KK1RRLFIDEK T+ ER+ A+IGEQ++++IFV++ +TYLRK 4-HD1M KLYI + 
364 GHFSKKIRRLFIDEKFTIAERQNAIIGEQDEQIIFVLIGNKTYLRKACKHDIMLAKLYID 423 

420 NLEK 423 

LEK 
424 KLEK 427 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1664 

A DNA sequence (GBSxl759) was identified in S.agalactiae <SEQ ID 5159> which encodes the amino 
acid sequence <SEQ ID 5160>. This protein is predicted to be hypoxanthine-guanine 
phosphoribosyltransferase (hpt). Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 37 - 53 ( 37 - 53) 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

. >GP:CAA48876 GB:X69123 hypoxanthine guanine 

phosphoribosyltransferase [Lactococcus lactis] 
Identities = 121/179 (67%), Positives = 152/179 (84%), Gaps = 1/179 (0%) 

Query: 2 LENDIKKVLYSEEDIILKTKELGAKLTADYAGKNPLLVGVLKGSVPFMAELLKHIDTHVE 61 

L+ I+KVL SEE+II K+KELG LT +Y GKNPL++G+L+GSVPF+AEL+KHID H+E 
Sbjct: 6 LDKAIEKVLVSEEEIIEKSKELGEILTKEYEGKNPLVLGILRGSVPFLAELIKHIDCHLE 65 
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Query: 62 IDFMWSSYHGGTTSSGEVKILKDXTDTigiSGRDVIFIEDIIDTGRTLKYLRDMFKYRQftN 121 

DFM VSSYHGGT SSGEVK++ DVDT ++GRD++ +EDIIDTGRTLKYL+++ ++R AN 
Sbjct: 66 TDFMTVSSYHGGTKSSGEVKDILDVDTAVKGRDILIVEDIIDTGRTLKYLKELLEHRGftN 125 

Query: 122 SVKVATLFDKPEGRLVDIDADYVCYDIPNEFIVGFGLDYAENYRNLPYVGVLKEEIYSK 180 

VK+ TL DKPEGR+V+I DY + IPNEF+VGFGLDY ENYRNLPYVGVLK E+Y+K 
Sbjct: 126 -VKIVTLLDKPEGRIVEIKPDYSGFTIPNEFWGFGIiDYEENYRNLPYVGVLKPEVYNK 183 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5161> which encodes the amino acid 
sequence <SEQ ID 5162>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>:>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4095 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/180 (85%) , Positives = 171/180 (95%) 

Query: 1 MLENDIKKVLYSEEDIILKTKELGAKLTADYAGKNPLLVGVLKGSVPFMAELLKHIDTHV 60 

MLE DI+K+LYSE DII KTK+LG +LT DY KNPL++GVLKGSVPFMAEL+KHIDTHV 
Sbjct: 1 MLEQDIQKILYSENDIIRKTKKLGEQLTKDYQEKNPLMIGVLKGSVPFMAELMKHIDTHV 60 

Query: 61 E I DFMWSSYHGGTTS SGEVKILKD VDTNIEGRDVI FIEDI IDTGRTLKYLRDMFKYRQA 120 

EIDFMWSSYHGGT+SSGEVKILKDVDTNIEGRD+I +EDI IDTGRTLKYLRDMFKYR+A 
Sbjct: 61 EIDFMWSE3YIIGGTSSSGEVKILKDVDTNIEGRDIIIVEDIIDTGRTLKYLRDMFKYRKA 120 

Query: 121 NSVKVATLFDKPEGRLVDIDADYVCYDIPNEFIVGFGLDYAENYRNLPYVGVLKEEIYSK 180 

N++K+ATLFDKPEGR+V I+ADYVCY+IPNEFIVGFGLDYAENYRNLPYVGVLKEE+YSK 
Sbjct: 121 NTIKIATLFDKPEGRWKIEADYVCYNIPNEFIVGFGLDYAENYRNLPYVGVLI^EEVYSK 180 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1665 

A DNA sequence (GBSxl760) was identified in S.agalactiae <SEQ ID 5163> which encodes the amino 
acid sequence <SEQ ID 5164>. This protein is predicted to be cell division protein FtsH (ftsH). Analysis of 
this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.11 Transmembrane 139 - 155 ( 133 - 158) 
INTEGRAL, Likelihood = -4.62 Transmembrane 8 - 24 ( 7-31) 

Final Results 

bacterial membrane --- Certainty=0. 3845 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC16243 GB:AF061748 cell division protein FtsH [Streptococcus pneumoniae] (ver 2) 
Identities = 490/652 (75%), Positives = 561/652 (85%), Gaps = 5/652 (0%) 

Query: 5 KNNGFLKNSFIYILLIIAVITTFQYYLKGTSSQ-NQQISYTKLVKQLKAGEIKSISYQPS 63 

+NNG +KN F+++L I ++T FQY+ G +S +QQI+YT+LV+++ G +K ++YQP+ 
Sbjct: 4 QNNGLIKNPFLWLLFIFFLVTGFQYFYSGNNSGGSQQINYTELVQEITDGNVKELTYQPN 63 
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Query. 




Sbjct: 


483 




544 


Sbjct: 


543 




604 


Sbjct: 


602 



G V+EVSG YK KT K F SV TKV F S ILP D+++ L A ++ 



■ VKHESSSG WI+ + S +P 1+ F MM GGG R MSFG4+KA++++K+++ 



KVRFSDVAGAEEEKQEL+EW+FLKDPKR+ LGARI PAGVLLEGPPGTGKTLLAKAVAG 



EAGVPFFS I SGSDFVEMFVGVGASRVRSIiFEDAKKA All FIDEIDAVGR+RG G+GGG 



NDEREQTLNQLLIEMDGFEGNE I IVIAATNRSDVLDPA1LRPGRFDRKVLVG+PDVKGR 



AEDRVIAGPSKKD+T+S++ER +VAYHFAGHTIVGL+LSNARWHKVTIVPRGRAGGYMI 



ALPKEDQMLLSK+DMKEQLAGLMGGRVAHEIIFN QTTGASNDFEQAT MARAMVTEYGM 



KLIAEALLKYETLD+ QIK+++ETGKMPE E+++ ALSYDE+K KM +E 
KLIAEALLKYETLDSTQIKALYETGKMPEAV--EEESHALSYDEVKSKMNDE 651 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5165> which encodes the amino acid 
sequence <SEQ ID 5166>. Analysis of this protein sequence reveals the following: 

Possible site: 3 8 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.38 Transmembrane 138 - 154 ( 132 - 158) 

Final Results 

bacterial membrane Certainty=0. 3951 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC16243 GB:AF061748 cell division protein FtsH [Streptococcus pneumoniae] (ver 2) 
Identities = 487/654 (74%), Positives = 565/654 (85%), Gaps = 7/654 (1%) 

Query: 5 KNNGFVKNSFIYILMIIVVITGFQFYLKGTSTQ-SQQISYSKLIKHLKAGDIKSLSYQPS 63 

+NNG +KN F+++L I ++TGFQ++ G ++ SQQI+Y++L++ + G++K L+YQP+ 
Sbjct: 4 Ql^GLIKNPFLWLLFIFFLVTGFQYFYSam-SGGSQQINYTELVQEITDGNVKELTYQPN 63 

Query: 64 GSIIEVKGKYEKPQKVTVNSGLSFLGGRASTQVTEFSSLVLPSDTILKEMTARADKNGTE 123 

GS+IEV G Y+ P+ +G+ F T+V +F+S +LP+DT + E+ A + E 

Sbjct: 64 GSVIEVSGVYKNPKTSKEGTGIQFFTPSV-TKVEKFTSTILPADTTVSELQKLATDHKAE 122 

Query: 124 LTVKQESSSGTWITFLMSFLPIVIFAAFMMMMM-NQ3GGG7ARGAMSFGKNKAKSQSKGNV 182 
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NDEREQTLNQLLIEMDGFEGNE I IVIAATMJSDVLDPALLRPGRFDRKVLVGRPD VKGR 
NDEREQTLNQLLIEMDGFEGNEGI IVIAftTNRSDVLDPALLRPGRFDRKVLVGRPDVKGR 362 

EAILRVHAKNKPLMJDVI^KWAQQTPGWGMLENVIiNEAALVAARRNKIKIDASDIDE 422 
EAIL+VHAKNKPLA DV+LK+VAQQTPGFVGADLENVLN3AALVAARRNK IDASDIDE 
EAILKVHAKNKPLAEDVDLKLVAQQTPGFVGADLENVLNEAALVARRRNKSIIDASDIDE 422 

Query: 423 AEDRVIAGPSKKDRTISQKEREMVAYHEAGHTIVGLVLSNARWHKVTIVPRGRAGGYMI 482 
AEDRVIAGPSKKD+T+SQKERE+VAYHE^GHTIVGLVLSNARVVHKVTIVPRGRAGGYMI 
423 AEDRVIAGPSKKDKTVSQKERELVAYHEAGHT1VGLVLSNAR\A7HKVTIVPRGRAGGYMI 4S2 

ALPICEDQMLLSKEDLKEQLAGLMGGRVAEEIVFNAQTSGASNDFEOATQIARAMVTEYGM 542 
ALPKEDQMLLSKED+KEQLAGLMGGRVAEEI+FN OT+GASNDFEQATQ+ARAMVTEYGM 
ALPKEDQMLLSKEDMKEQIAGLMGGRVAEEIIFWQTTGASNDFEQATQMARAMVTEYGM 542 

SEKLGPVQYEGNHAMMPGQISPEKAYSAQTAQMIDDEVRELLNQARNQAADI INENRDTH 602 
SEKLGPVQYEGNHAM+ Q SP+K+ S OTA ID+EVR 





Sbjct: 


123 ■ 


5 


Query: 


183 




Sbjct: 


183 




Query: 


243 


10 


Sbjct: 


243 




Query: 


303 


15 


Sbjct: 


303 




Query: 


363 




Sbjct: 


363 


20 


Query: 


423 . 




Sbjct: 


423 


25 




483 




Sbjct: 


483 




Query: 


543 


30 


Sbjct: 


543 




Query: 


603 


35 


Sbjct: 

An alignm 

Ident. 


602 

ent o 
ities 


40 


Query: 


1 




Sbjct: 


1 




Query: 


61 


45 


Sbjct: 


61 




Query: 


121 


50 


Sbjct: 


121 




Query: 


181 




Sbjct: 


180 


55 


Query: 


241 




Sbjct: 


240 


60 


Query: 


301 




Sbjct: 


300 




Query: 


361 


65 


Sbjct: 


360 




Query: 


421 
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3 WI L+S +P 1 F+ MM N GGG R MSFG++K&K+ +K 4 



KVRF+DVAGAEEEKQELVEW+FLK+PK++ LGARI PAGVLLEGPPGTGKTLLAKAVAG 



EAGVPFFSISGSDFVEMFVGVGASRVRSLFEDAKKA 



KLIAEALLKYETLD+ QIK++YETGKMP E + E+HALSYDE+K+KM 4 



i = 550/657 (83%), Positives = 612/657 (92%), Gaps = 2/657 (0%) 

MKNNKNNGFLKNSFIYILLIIAVITTFQYYLKGTSSQNQQISYTKLVKQLKAGEIKSISY 60 
MKNNKNNGF+KNSFIYIL+II VIT FQ+YLKGTS+Q+QQISY+KL+K LKAG+IKS+SY 
MKNNKNNGFVKNSFIYILMIIWITGFQFYLKGTSTQSQQISYSKLIKHLKAGDIKSLSY 60 



QPSG ++EV G Y+K +4 + +FLGG +T+VT F+S++LP+D+ +K +- +AA++N 



4 VK ESSSGTWI+++ SFLP+VI F MMMMNQGGGG ARGAMSFGKNKA+S £ 



VKVRF+DVAGAEEEKQEL+EWDFLK+PK+YKSLGARIPAGVLLEGPPGTGKTLLAICA 



VAGEAGVPFFSISGSDFVEMFVGVGASRVRSLFEDAKKAERAIIFIDEIDAVGRRRGAGM 



GGGM3EREQTLNQLLIEMDGFEGNE+IIVIAATITOSDVLDPALLRPGRFDRKVLVG+PDV 



Query: 421 IDEAEDRVIAGPSKKDRTISERERAMVAYHEAGHTIVGLILSNARWHKVTIVPRGRAGG 480 
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IDEAEDRVIAGPSKKDRTTS++ER MVAYHEAGHTIVGL+LSNARWHKVTIVPRGRAGC3 
Sbjct: 420 IDEAEDRVIAGPSICKDRTISQKEREMX'-AYHEAGHTIVGL^/liSNARWHKVTIVPRGRAGG 479 

Query: 481 YMIALPKEDQMLLSKDDMKEQLAGLMGGRVAEEI I FNAQTTGASNDFEQATAMARAMVTE 540 
5 YMIALPKEDQMI.LSK+D+KEQLAGLMGGRVAEEI+FNAQT+GASNDFEQAT +ARAMVTE 

Sbjct: 480 YMIALPKEDQMLLSKEDLECEQLAGLMGGRVAEEIVFNAQTSGASHDFEQATQIARAMVTE 539 

Query: 541 YGMSEKLGPVQYEGNHAMMAGQNS PEKS YSAQTAQL I DDEVRHLLNEAKNKAADI INENR 600 
YGMSEKLGPVQYEGNHAMM GQ+SPEK+YSAQTAQ+IDDEVR LLN+ARN+AAD I INENR 
10 Sbjct: 540 YGMSEKLGPVQYEGNHAMMPGQISPEKAYSAQTAQMIDDEVRELLNQARMQAADIINENR 599 

Query: 601 DTHKLIAEALIiKYETLDAAQIKS I FETGKMP - ETENDEDKARALSYDEIKEKMQEED 656 

DTHKLIAEALLKYETLDAAQIKSI+ETGKMP + E D+++A ALSYDEIK KM E + 
Sbjct: 600 DTHKLIAEALLKYETLDAAQIKSIYETGKMPVDLETDDNEAHALSYDEIKNKMTESE 656 

15 

SEQ ID 5164 (GBS115) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 35 (lane 8; MW 73kDa) and in Figure 39 (lane 3; MW 73.3kDa). 
Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1666 

A DNA sequence (GBSxl769) was identified in S.agalactiae <SEQ ID 5167> which encodes the amino 
acid sequence <SEQ ID 5168>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

»> Seems to have no N- terminal signal sequence 

25, 

Final Results ------ 

bacterial cytoplasm --- Certainty=0. 2983 (Affirmative) < suco 
bacterial membrane --- Certainty=0, 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Hot Clear) < suco 

30 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1667 

A DNA sequence (GBSxl770) was identified in S.agalactiae <SEQ ID 5169> which encodes the amino 
acid sequence <SEQ ID 5170>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0. 2424 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45, 

A related GBS nucleic acid sequence <SEQ ID 9547> which encodes amino acid sequence <SEQ ID 9548> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12187 GB:Z99106 similar to homoserine dehydrogenase [Bacillus subtilis] 
50 Identities = 223/448 (49%), Positives = 313/448 (69%) 
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121 








181 
























351 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 



MKWKFGGSSLASSQQLYKVLNIIKSDYTRRFV^/ 1 /SAPGKR'^EEDLKMTDALIQYYQNYI 60 
MKWKFGGSSLAS QL KV +1+ SD R+ WVSAPGK Y ED K-i-TD LI + Y+ 
MKWKFGGSSIiASGAQLDKVFHIVTSDPARKAVVVSAPGKHYAEDTKVTDLLIACAEQYL 60 



h TFSR GSDITGS++A G++ADLY3^FTDVD +++ +P V+NP I E 



LTY+EMREL+YAGFSV HDEAL+PA+R IP+ IKNTNNP GT++V K 



+E++A++ALY+ FF 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or d 



Example 1668 

A DNA sequence (GBSxl771) was identified in S.agalactiae <SEQ ID 5171> which encodes the amino 
acid sequence <SEQ ID 5172>. This protein is predicted to be CbbY family protein. Analysis of this protein 
sequence reveals the following: 



3 N-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



-' Certainty=0. 2699 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF96016 GB:AE0043B3 CbbY family protein [Vibrio cholerae] 
Identities = 59/190 (31%) , Positives .= 93/190 (48%) , Gaps = 10/190 (5%) 

Query: 4 YKAIIFDMDG\^FDTELFYYKRRERFLKQHGITIDHLPmFFIGGNMKQVWKSVLGDQYD 63 

++A IFDMDG+L DTE + + G+ IG N K + +L Y 

Sbjct: 6 FQAAIFDMDGLLLDTERVCMRVFQEACTACGLPFRQEVYLSVIGCNAKTI-NGILSQAYG 64 

Query: 64 TWDIDKL QQDYSRYKEDNPLPYKDLIFQDCKEVIEKLHHKGYLLGLASSSTRHDIM 119 

D+ +L +Q Y+ +P+KD + ++E L + + +A+S+ + + 

Sbjct: 65 E-DLPRLHNEWRQRYNAWMHEAI PHKDGVIA LLEWLKARSIPVAVATSTQKEVAL 119 

Query: 120 LALESFNLDTYFKVILSGEEFSESKPlTOAIYNRAAEIiLDIPKQQILIVEDSEKGITAGIA 179 
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+ L* LD YF I +G E ++ KP+P IY AAE L + QQ L . EDS GI A +A 
Sbjct: 120 IKLQIAGLDHYFAWITTGCEVTQGKPHPEIYLLAAERLGVEPQQCLAFEDSNNGIKAAMA 179 

Query: 180 AGIDVWMED 189 

A + + I D 
Sbjct: 180 AQMHAFQIPD 189 

There is also homology to SEQ ID 448. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1669 

A DNA sequence (GBSxl772) was identified in S.agalactiae <SEQ ID 5173> which encodes the amino 
acid sequence <SEQ ID 5174>. This protein is predicted to be Pseudomonas putida enoyl-CoA hydratase II 
homologue (bl394). Analysis of this protein sequence reveals the following: 

Possible site: 45 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.18 Transmembrane 128 - 144 ( 128 - 145) 
INTEGRAL Likelihood = -1.06 Transmembrane 154 - 170 ( 154 - 170) 

Final Results 

bacterial membrane Certainty=0. 1871 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9549> which encodes amino acid sequence <SEQ ID 9550> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5175> which encodes the amino acid 
sequence <SEQ ID 5176>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.08 Transmembrane 110 - 126 ( 109 - 128) 

Final Results 

bacterial membrane Certainty=0. 2232 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 150/263 (57%) , Positives = 197/263 (74%) 

Query: 19 LKFENIIYGIDGNVATIMLNRPDISNGFNIPMCQEIIDAIRLVSENKDVMFLVIEAQGPI 78 

++F++II+ + ++AT+ LNRP++SNGFNIP+CQEI+ A+ V + V FL+I+A G + 
Sbjct: 1 MQFKHI I FDWDDLATLTLNRPE VSKGFNI P I CQE I LVALAEVKRDTSVRFLLI KAVGKV 60 

Query: 79 FSIGGDLKVMKAAVESDDISSLTKIAELWQISYDLLQLEKPVVMCVDGAVAGAAANIAL 138 

FS+GGDL M+ AV D++ SL KIAELV +IS+ + L KPV++C DGAVAGAA NIAL 
Sbjct: 61 FSVGGDLVEMQEAVAKDNVQSLV^IAELVQEISFAIKHLPKPVILCADGAVAGAAFNIAL 120 

Query: 139 AADWIASECKSKFIQAFVGVGLAPDAGGIjLLLSKSIGITRAVQLALTGESLSAEKAEALG 198 

A DF IAS ++KFIQAFV VGLAPDAGGL LL++++G+ RA L +TGE ++A+K G 
Sbjct: 121 AVDFCIASTQTKFIQAFVir/GLAPDAGGLFLLTRAVGLNRATHLVMTGEGITADKGLDYG 180 

Query: 199 IVYKLCESDKIGKIKDQLLKRLSRHSINSYQAIKSLAWEAAFKDWEQYKKLELQLQESLA 258 

VY+ ESDK+ K+ QLLKRL R S NSY +KSL W++ F WE Y K EL +QE LA 
Sbjct: 181 FVYRTAESDKLDKVCLQLLKRLRRGSSNSYAGMKSLVWQSFFTGWEDYAKAELAIQEELA 240 
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Query: 259 FKQDFKEGVRAHADRRRPNFLGK 281 

FK+DFKEGV A +RRRPNF GK 
Sbjct: 241 FKEDFKEGVIAFGERRRPNFQGK 263 



5 A related GBS gene <SEQ ID 8877> and protein <SEQ ID 8878> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
SRCFLG: 0 

McG: Length of UE: 9 
10 Peak Value of TJR: 1.45 

Net Charge of CR: -1 
McG: Di scrim Score: -5.99 
GvH: Signal Score (-7.5): -4.37 
Possible site: 27 
15 >» Seems to have no N-terminal signal sequence 

Amino Acid Composition: calculated from 1. 
ALOM program count: 2 value: -2.18 threshold: 0.0 

INTEGRAL Likelihood - -2.18 Transmembrane 110 - 126 ( 110 - 127) 
INTEGRAL Likelihood = -1.06 Transmembrane 136 - 152 (136-152) 
20 PERIPHERAL Likelihood = 1.32 49 

modified ALOM score: 0.94 
icml HYPID: 7 CFP: 0.187 



*** Reasoning Step: 3 

25 

■ Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=o. 0000 (Not clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein has homology with the following sequences in the databases: 

ORF01047(355 - 1143 of 1443) 

GP|3253198|gb|AAC24330.l| |AF029714(1 - 263 of 263) PhaB {Pseudomonas putida} 
%Match =15.4 
35 %Identity =33.3 %Similarity =56.4 

Matches = 88 Mismatches = 113 Conservative Sub.s = 61 

96 126 156 186 216 246 276 306 

*KTTORGLQLVLQPVLMCGLLKINTLE*ISRRLMY**AI*VNFL*N*ITIKNGKFNSVFLFFILP*KLGL**NTKHDNLI 

40 

336 366 396 426 456 486 516 546 

IKLFFIFLSLLKRGDILKFENIITCIDGIWATIMUqRPDISNGFlitlPMCQEIIDAIRLVSENKDVMFLVIEAQGPIFSIG 
: |,,|., |= || : ||||: | || I 1= =h= ! » I l = = 1 = 1 I I 
MTFQHILFSIEDGVAFLSLNRPEQLNSFNAAMHLEVREALKQVRQSSDARVLLLTAEGRGFCAG 
45 10 20 30 40 50 60 



576 606 636 666 696 726 756 786 

GDLKVMKAAVESDDISSLTKIAELWQISYDLLQLEKPVVMC\TDGAVA3AAANIALAADFVIASKKSKFIQAFVGVGIAP 
II I :=: | :: | : | | ||: |:| ||| ||| || |:|:| : : ||||| :|| | 

50 QDLSDR1WAPDAEVPDLGESIDKFYKPLWTLRDLPLPVI»VNGVAAGAGANIPLACDLVLAGRSASFIQAFCKIGLVP 
80 90 100 110 120 . 130 140 



816 846 876 906 936 966 996 1026 

nAGGLLLLSKSIGITFAVQLALTGESLSAEKAEALGIWKLCESDKIGKiroQLLKRLSRHSINSYQAIKSLAWFAAFKD 
55 |:|| || : =|: II ||= || I || = |: | = : = = = = = I = = 1= : || = hi = 

DSG<3TWLLPRLVGMARAKALAMLGERLGAEQAQQWGLIHRVVDDARLRDEALTLARQLASQPTYGLALIK-RSLNASFDN 
160 170 180 190 200 210 220 



1053 1083 1113 1143 1173 1203 1233 1263 

60 -WEQYKKLELQLQESLAFKQDFKEGVRAHADRRRPNFLGK* FENQI I *D* SLANKFEL* YNLI IKV* CEWISWNTIRLI 

= = : :|l I! 'h'lll I -III 1 = 

GFDEQLELERDLQRLAGRSEDYREGVSAFMNKRTPAFKGR 
240 250 260 



WO 02/34771 



-1878- 



PCT/GB01/04789 



SEQ ED 8878 (GBS374) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 64 (lane g; MW 32kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 71 (lane 2; MW 57kDa). 

The GBS374-GST fusion product was purified (Figure 215, lane 9) and used to immunise mice. The 
5 resulting antiserum was used for FACS (Figure 307), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1670 

10 A DNA sequence (GBSxl773) was identified in S.agalactiae <SEQ ID 5177> which encodes the amino 
acid sequence <SEQ ID 5178>. This protein is predicted to be a 16.1 kDa transcriptional regulator. Analysis 
of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

bacterial membrane Certainty»0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD05186 GB:AF110185 unknown [Burkholderia pseudomallei] 
Identities = 30/102 (29%) , Positives = 60/102 (58%) 

25 Query: 32 DVSLKEMHTIEIIGKHSEvTPSDVARSLMLTLGTVTTSLNKLEKKGYIERKRSSIDRRVV 91 

+++ +++ I ++ + TP +++R+L G++T L++LEKKG++ R RS DRRV+ 
Sbjct: 39 ELTAQQISVILLLARGYARTPFELSRKLSYDSGSMTRMLDRLEKKGFWRARSESDRRVI 98 

Query: 92 HLSLTKRGRLLDRLHSKFHKSMVSHIIEDLGEEDIKMLTSAL 133 
30 D+LT+RG R + ++ +E +++ +LT L 

Sbjct: 99 ELALTERGAHAARADPALIATELNAQLEGFSADELALLTDLL 140 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5179> which encodes the amino acid 
sequence <SEQ ID 5180>. Analysis of this protein sequence reveals the following: 

35 Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1412 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/144 (77%) , Positives = 129/144 (89%) 

45 

Query: 1 MEYDQINSYLVDIFNRIMIIEEMSLfCTSQFSDVSLKEMHTIEIIGKHSEVTPSDVARELM 60 

+EYD+I YLVDIFNRI++IEEMSLKTSQFSDVSLKEMHTIEIIGK+ +VTPSD+ARELM 
Sbjct: 7 LEYDKIYPYLVDIFNRILVIEEMSLKTSQFSDVSLKEMHTIEIIGKYDQVTPSDIARELM 66 



50 Query: 61 LTLGTVTTSIiNKLEKKGYIERKRSSIDRRVVHLSLTKRGRLLDRLHSKFHKSMVSHIIED 120 

+TLGTVTTSUJKLE KGYI R RS DRRW+LSLTKRGRLLDRLH+KFHK+MV H+I D 
Sbjct: 67 OTLGTVTTSMIOLEAKGYIARTRSRSDRRWYLSLTKRGRLLDRLHAKFHKNMVGHVIAD 126 
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Query: 121 LGEEDIKMLTSALGNLHKFLEDLV 144 

+ +E4++ L LGNLH+FLEDLV 
Sbjat: 127 MSDEEMQALVRGLGNLHQFLEDLV 150 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1671 

A DNA sequence (GBSxl774) was identified in S.agalactiae <SEQ ID 5181> which encodes the amino 
acid sequence <SEQ ID 5182>. This protein is predicted to be 3-oxoacyl-(acyl-carrier-protein) synthase III 
10 (fabH-2). Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.12 Transmembrane 103 - 119 ( 103 - 119) 

15 Final Results 

bacterial membrane Certainty=0. 1447 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98271 GB:AF197933 beta-ketoacyl-ACP synthase III 
[Streptococcus pneumoniae] 
Identities = 225/324 (69%), Positives = 276/324 (84%), Gaps = 1/324 (0%) 

25 Query: 1 MVFAKISQLAHYAPSQIIKNEDLSLI^TSDDWISSRTGIKQRHISKITOTTADLANKVAE SO 

M FAKISQ+AHY P Q++ N DL+ IMDT+D+WISSRTGI+QRHIS+ E+T+DLA +VA+ 
Sbjct: 1 MAFAKISQVAHYVPEQWTNHDLAQIMDTNDEWISSRTGIRQRHISRTESTSDLATEVAK 60 

Query: 61 QLIEKSGYSASQIDFIIVATMTPDSMMPSTAARVQAHIGASNAFAFDLSAACSGFVFALS 120 
30 +L+ K+G + ++DFI I +AT+TPDSMMPSTAARVQA+IGA+ AFAFDL+AACSGFVFALS 

Sbjct: 61 KLMAKAGITGEELDFIILATITPDSMMPSTAARVQANIGANKAFAFDLTAACSGFVFALS 120 

Query: 121 TAEKLISSGSYQKGLVIGAETVSKVLDWTDRGTAVLFGDGAGGVLLEASKEKHFLAESLN 180 
TAEK I+SG +QKGLVIG+ET+SK +DW4DR TAVLFGDGAGGVLLEAS+++HFLAESLN 
35 Sbjct: 121 TAEKFIASGRFQKGLV1GSETLSKAVEWSDRSTAVLFGDGAGGVLLEASEQEHFLAESLN 180 

Query: 181 TDGSR-QGLQSSQVGLNSPFSDEVLDDKFLKMDGRAIFDFAIKEVSKSINHLIETSYLEK 239 

+DGSR + L GL+SPFSD+ D FLKMDGR +FDFAI + +V+KSI 1+ S +E 

Sbjct- 181 SDGSRSECLTYGHSGLHSPFSDQESADSFLKMDGRTVFDFAIRDVAKSIKQTIDESPIEV 240 

40 

Query: 240 EDIDYLFLHQANRR1LDKMSRKIDIARDKFPENMMDYGNTSAASIPILLSESYENGLLKL 299 

D+DYL LHQAN RILDKM+RKI + R K P NMM+YGNTSAASIPILLSE E GL+ L 
Sbjct: 241 TDLDYLLLHQANDRILDKMARKIGVDRAKLPANMMEYGNTSAASIPILLSECVEQGLIPL 300 

45 • Query: 300 DGNQTILLSGFGGGLTWGSLIVKI 323 

DG+QT+LLSGFGGGLTWG+LI+ I 
Sbjct: 301 DGSQTVLLSGFGGGLTWGTLILTI 324 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5183> which encodes the amino acid 
50 sequence <SEQ ID 5 1 84>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 103 - 119 ( 103 - 120) 

55 Final Results 

bacterial membrane Certainty=0 .1065 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAF98271 GB-.AF197933 beta -ketoacyl-ACP synthase III 
[Streptococcus pneumoniae] 
Identities = 212/324 (65%) , Positives = 263/324 (80%) 

MIFSKISQVMWPQQLVTOTDLASIMDTSHEWIFSRTGIAERHISRDEMTSDLAIQVAD 60 
M F+KISQVAHYVP+Q+VTN+DLA IMDT+ EWI SRTGI +RHISR E TSDLA +VA 
MAFAKISQVAHYVPEQWTNHDLAQIMDTNDEWISSRTGIRQRHISRTESTSDLATEVAK 60 



+L+ ++G+ + +DFII+ATI+PD+ MPSTAA+VQA I A AFAFD+TAACSGFVFAL+ 



A+K IASG +Q G+VIG+ETLSK V+W DR4TAVLFGDGAGGVLLEAS+ +H LAE+L+ 















10 


Query: 


61 




Sbjct- 


61 


15 


Query- 


121 




Sbjct: 


121 




Query: 


181 


20 


Sbjct: 


181 




Query: 


241 


25 


Sbjct: 


241 






301 




Sbjct: 


301 


30 
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Identities 




Query: 


i 


35 


Sbjct: 






Query: 


61 


40 


Sbjct: 


61 




Query: 


121 




Sbjct: 


121 


45 


Query: 


181 




Sbjct: 


181 


50 




240 




Sbjct: 


241 






300 


55 


Sbjct: 


301 




Based on this ; 



h ++MDGR +FDFAIRDV+KSI 



t LLHQAN RILDK+ARKI V R K NMM Y 



DG+Q +LLSGFGGGLTWG+LI+ 



i = 216/324 (66%) , Positives = 271/324 (82%) , Gaps = 1/324 (0%) 

MVFAKISQIAHYAPSQIIKNEDLSLIMDTSDDWISSRTGIKQRHISKNETTADLANKVAE 60 
M+F+KISQ+AHY P Q++ K DL+ IMDTS +WI SRTGI +RHIS++E T+DLA +VA+ 
MIFSKISQVAHWPQQLVTI^IASIMDTSHEWIFSRTGIAERHISRDEMTSDLAIQVAD 60 



QL+ +SG A IDFIIVAT++PD+ MPSTAA+VQA I A++AFAFD++AACSGFVFAL+ 



A+KLI+SG+YQ G+VIGAET+SK+++W DR TAVLFGDGAGGVLLEASK+KH LAE+L+ 



TDG+R Q L S + L+SP+S ++MDGRAIFDFAI + +VSKSI 



+DIDY LHQANRRILDK++RKID+ R+KF ENMM YGNTSAASIPHjLSE+ + G ++L 



DG Q IIiLSGFGGGLTWGSLIV+I 



vaccines or diagnostics. 
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Example 1672 

A DNA sequence (GBSxl775) was identified in S.agalactiae <SEQ ID 5185> which encodes the amino 
acid sequence <SEQ ID 5186>. This protein is predicted to be acyl carrier protein (acpP). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 59 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3083 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9551> which encodes amino acid sequence <SEQ ID 9552> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98272 GB:AF197933 acyl carrier protein [Streptococcus pneumoniae] 
Identities = 64/74 (85%) , Positives = 67/74 (90%) 

.Query: 17 MAVFEKVQEIIVEELGKDAEEVTLNTTFDDLDADSLDVFQVISEIEDAFDIQIETEEGLN 76 
20 MAVFEKVQEI IVEELGKDA EVTL +TFDDDDADSLD+ FQVI SE 1 EDAFDI QIE E L 

Sbjct: 1 MAVFEKVQEIIVEELGKDASEVTLESTFDDLDADSLDLFQVISEIEDAFDIQIEAENDLK SO 

Query: 77 TVGDLVAYVEEKVK 90 
TVGDLVAYVEE+ K 
25 Sbjct: 61 TVGDLVAYVEEQAK 74 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5187> which encodes the amino acid 
sequence <SEQ ID 5188>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2995 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 70/74 (94%) , Positives = 71/74 (95%) 

40 Query: 17 MAVFEKVQEIIVEELGKDAEEVTIiNTTFDDLDADSLDVFQVISEIEDAFDIQIETEEGLN 76 

MAVFEKVQE 1 I VEELGK+ EEVTL TTFDDLDADSLDVFQVISEIEDAFDIQIETEEGLN 
Sbjct: 1 MAVFEKVQEIIVEELGKETEEVTLETTFDDLDADSLDVFQVISEIEDAFDIQIETEEGLN 60 

Query: 77 TVGDLVAYVEEKVK 90 
45 TVGDLVAYVEEK K 

Sbjct: SI TVGDLVAYVEEKSK 74 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

50 Example 1673 

A DNA sequence (GBSxl777) was identified in S.agalactiae <SEQ ID 5189> which encodes the amino 
acid sequence <SEQ ID 5190>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -0.27 Transmenibrane 156 - 1.72 ( 156 - 173) 

Final Results 

bacterial membrane Certainty=0. 1107 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

iGP:AAF98273 GB:AF197933 trans-2-enoyl-ACP reductase II 
[Streptococcus pneumoniae] 
Identities = 257/318 (80%), Positives = 277/318 (86%), Gaps = 1/318 (0%) 

Query: 1 MKTR1TELIMIKYPIFQGGMAWVADGDLAGAVSKAGGLGIIGGGNAPKEWKANIDKIKS 60 

MKTRITELL I YPIFQGGMAWVADGDLAGAVSICAGGLGIIGGGNAPKEWKANIDKIKS 
Sbjct: 1 MKTRITELLKI DYP I FQGGMAWVADGDLAGAVS KAGGLG 1 1 GGGNAPKE WKANIDKI KS 60 

Query: 61 MTDKPFGVNIMLLSPFVDDIVDLVIEEGVKVVTTGAGNPGKYMERFHEAGITVIPWPSV 120 

+TDKPFGVNIMLLSPFV+DIVDLVIEEGVKWTTGAGNP KYMERFHEAGI VIPWPSV 
Sbjct: 61 LTDKPFGWIMLLSPFVEDIVDLVIEEGVKWTTGAGNPSKYMERFHEAGIIVIPWPSV 120 

Query: 121 ALAKRMEKLGADAIITEGMEAGGHIGKLTTMTLVRQVVDAVTIPVIAAGGIADGRGAAAG 180 

ALAKRMEK4GADA+I EGMEAGGHIGKLTTMTBVRQV A++IPVIAAGGIADG GAAAG 
Sbjct: 121 ALAKRMEKIGADAVIAEGMEAGGHIGKLTTMTLVRQVATAISIPVIAAGGIADGEGAAAG 180 

Query: 181 FMLGADAVQVGTRFWAKESNAHPNYKAKILKAKDIDTAVSAQWGHPVRALKNKLVTTY 240 

FMLGA4AVQVGTRFWAKESNAHPNYK KILKA+DIDT +SAQ GH VRA+KN+L + 
Sbjct: 181 FMLGAEAVQVGTRFWAKESNAHPNYKEKILKARDIDTTISAQHFGHAVRAIKNQLTRDF 240 

Query: 241 SQAEKDYLAGRI S INEI -EELGAGAI,RNA\A/DGDVINGSVMAGQIAGLI KSEETCQEI LE 299 

AEKD EI E++GAGAL AW GDV GSVMAGQIAGL+ EET +EIL+ 

Sbjct: 241 ELAEKDAFKQEDPDLEIFEQMGAGALAKAWHGDVDGGSVMAGQIAGLVSKEETAEEILK 300 

Query: 300 DIYSGARQVILSEASRWS 317 

D+Y GA + .1 EASRW+ 
Sbjct: 301 DLYYGAAKKIQEEASRWT 318 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5191> which encodes the amino acid 
sequence <SEQ ID 5192>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 106 - 122 ( 106 - 124) 
INTEGRAL Likelihood = -0.22 Transmembrane 156 - 172 ( 156 - 173) 

Final Results 

bacterial membrane Certainty=0. 1680 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF98273 GB:AF197933 trans-2-enoyl-ACP reductase II 
[Streptococcus pneumoniae] 
Identities = 252/320 (78%), Positives = 276/320 (85%), Gaps = 1/320 (0%) 

Query: 1 MKTRITELIiNIDYPIFQ^GMAWVADGDIiAGAVSNAGGLGIIGGGNAPKEVVKANIDRVKA 60 

MKTRITELL IDYPIFQGGMAWVADGDLAGAVS AGGLGI IGGGNAPKEWKANID++K+ 
Sbjct: 1 MKTRITELLKIDYPIFQGGMAWVADGDLAGAVSKAGGLGIIGGGNAPKEVAnCANIDKIKS 60 

Query: 61 ITDRPFGVNIMLLSPFADDIVDLVIEEGVKWTTGAGNPGKYMERLHQAGIIWPWPSV 120 

+TD+PFGVNIMLLSPF +DIVDLVIEEGVKWTTGAGNP KYMER H+AGIIV+PWPSV 
Sbjct: 61 LTDKPFGVNIMLLSPFVEDIVDLVIEEGVK\'\'TTGAGNPSKYMERFHEAGIIVIPWPSV 120 

Query: 121 AmKPJ^EKLGVDAVIAEGMEAGGHIGKLTTMSIiVRQWEAVSIPVIAAGGIADGHGAAAA 180 

ALAKRMEK+G DAVIAEGMEAGGHIGKLTTM+LVRQV A+SIPVIAAGGIADG GAAA 
Sbjct: 121 ALAKRMEKIGADAVIAEGMEAGGHIGKLTTMTLWQVATAISIPVIAAGGIADGEGAAAG 180 
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Query: 181 FMLGAEAVQIGTRFVYAKESNAHQNFKDKILAAKDIDTVISAQWGHPVRSIKHKLTSAY 240 

FMLGABAVQ+GTRFWAKESNftH N+K+KIL A+DIDT ISAQ GH VR+IKN+LT + 
Sbjct: 181 FMLGAEAVQVGTRFWAKESNAHPNYKEKILIQ^RDIDTriSAQHFGHAVRAIKNQLTRDF 240 

Query: 241 AKAEK-AFLIGQKTATDIEEMGAGSLRHAVIEQDVVNGSVMAGQIAGLYRKEESCETILK 299 

AEK AF E+MGAG+L AV+ GDV GSVMAGQIAGLV KEE+ E ILK 

Sbjct: 241 ELAEKDAFKQEDPDLEIFEQMGAGALAKA\7VHGDVDGGS 1 /MAGQIAGLVSKEETAEEILK 300 

Query: 300 DI YYGAARVI QNEAKRWQSV 319 

D+YYGAA+ IQ EA RW V 
Sbjct: 301 DLYYGAAKKIQEEASRWTGV 320 

An alignment of the GAS and GBS proteins is show below. 

Identities = 253/319 (79%) , Positives = 291/319 (90%) 

Query: 1 MKTRITELLNIKYPIFQGGMAWVADGD1AGAVSKAGGLGIIGGGNAPKEWKANIDKIKS 60 

MKTRITELLNI YPIFQGGMAWVADGDLAGAVS AGGLGIIGGGNAPKEWKANID++K+ 
Sbjct: 1 MKTRITELLNIDYPIFQGGMAWVADGDLAGAVSNAGGLGIIGGGNAPKEWKANIDRVKA 60 

Query: 61 MTDKPFGVNIMLLSPFVDDIUDLVIEEGVKWTTGAGNPGKYMERFHEAGITVIPWPSV 120 

+TD+PFGVNIMLLSPF DDI VDLVI EEGVKWTTGAGNPGKYMER H+AGI V+PWPSV 
Sbjct: 61 ITDRPFGWIMLLSPFADDIVDLVIEEGVIWVTTGAGNPGKYMERLHQAGIIWPWPSV 120 

Query: 121 AIAKRMEKLGADAIITEGMEAGGHIGK1TTMTLWQVVDAVTIPVIAAGGIADGRGAAAG 180 

ALAKRMEKLG DA+I EGMEAGGHIGKLTTM+LVRQW+AV+ 1 PVI AAGG IADG GAAA 
Sbjct: 121 ALAKRMEKLGVDAVIAEGMEAGGHIGKLTIMSLWQVVEAVSIPVIAAGGIADGHGAAAA 180 

Query: 181 FMLGADAVQVGTRFWAKESNAHPNYKAKILKAKDIDTAVSAQVVGHPVRALKNKLVTTY 240 

FMLGA+AVQ+GTRFWAKESNAH N+K KIL AKDIDT +SAQWGHPVR++KNKL + Y 
Sbjct: 181 FMLGAEAVQIGTRFWAKESNAHQNFKDKIIAAKDIDOTISAQWGHPVRSIKNKLTSAY 240 

Query: 241 SQAEI<DYLAGRISINEIEELGAGALRNAWDGDVINGSVMAGQIAGLIKSEETCQEILED 300 

++AEK +L G+ +■ +IEE+GAG+LR+AV++GDV+NGSVMAGQIAGL++ EE+C+ IL+D 
Sbjct: 241 AKAEKAFLIGQKTATDIEEMGAGSIiRHAVIEGDWNGSVMAGQIAGLVRKEESCETIIiKD 300 

Query: 301 IYSGARQVILSEASRWSDL 319 

IY GA +V1 +EA RW + 
Sbjct: 301 I YYGAARVI QNEAKRWQSV 319 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1674 

A DNA sequence (GBSxl778) was identified in S.agalactiae <SEQ ID 5193> which encodes the amino 
acid sequence <SEQ ID 5194>. This protein is predicted to be MCAT (fabD). Analysis of this protein 
sequence reveals the following: 

3 N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty^O . 1276 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with a S. pneumoniae sequence: 

Identities = 203/306 (66%), Positives = 242/306 (78%), Gaps = 1/306 (0%) 

Query: 1 ^KVSFLFAGQGAQKLGMARDLYETFPIVKETFDKASF^^ 60 

M K +FLFAGQGAQ LGM RD Y+ 4-PIVKET D+AS VLGYDLR LID + DKLNQT+Y 
Sbjct: 1 MTKTAFLFAGQGAQYLGMGRDFYDQYPIVKETIDRASQVLGYDLRYLIDTEEDKLNQTRY 60 
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Query: 61 TQPAILTTSTAIYRLILKEIELRPDMVAGLSLC-EYSALVASGAIRFEDAWLVARRGQLM 120 
TQPAIL TS AIYRL L+E +PDMVAGLSLGEYSAI1VASGA+ FEDAV LVA+RG M 

sbjct: 6i tqpaiiatsvaiyrl-lqekgyqpdi^glslgeysalvasgaldfedavalvakrga™ 119 

Query: 121 EAAA.PAGSGKMVAVLNADRQIIEDACKKaSQFGIVSPANYNTPKQIVIGGESIAVNAAVE 180 

E AAPA SGKMVAVIiN ++1E+AC+KAS+ G+V+PANYNTP Q1VI GE 4-AV+ AVE 
Sbjct: 120 EEAAPADSGKIWAVLNTPVEVIEEACQKASELC-WTPAmOT 179 

Query. 181 ELKQQGVKRLIPL1WSGPFHTALLKPA8QKLSDVLDKVHFSVSE1PVIGNTEAQIMKKDD 240 

L++ G KRLIPL VSGPFHTAL1,+PASQKI 1 ++ L +V FS P++GNTEA +M+K+D 
Sbjct: 180 LLQEAGAKRLIPLKVSGPFHTALLEPASQKLAETLAQVSFSDFTCPLVGNTEAAVMQKED 239 

Query: 241 IKSLLARQVMEPVRFDESIETMKICMGMTQWE1GPGKVLSGFLKKIDSSLSVHSVEDKIG 300 

I LL RQV EPVRF ESI M++ G+-S- +EIGPGKVLSGF+KKID + + VED+ 
Sbjct: 240 IAQLLTRQVKEPVRFYESIGVMQEAGISNFIE1GPGKVLSGFVKKIDQTAHLAHVEDQAS 299 

Query: 301 FNNLKE 306 
L E 

Sbjct: 300 LVALLE 305 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5195> which encodes the amino acid 
sequence <SEQ ID 5196>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1602 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/299 (67%), Positives = 248/299 (82%), Gaps = 1/299 (0%) 

Query: 1 IWKVSFLFAGQGftQKLGMJUOTLYETFPIVKETFDKASHVIjGYDLRELIDKrJLDKLNQTKY SO 

M K +FLFAGQGAQKLGMARD Y+ F IV++TFD+AS VLGYDLR LID D KLNQT Y 
Sbjct: 3 MTKTAFLFAGQGAQKLGMARDFYDNFA1VRKTFDQASQVLGYDLRRLIDSDELKLNQTSY 52 

Query: 61 TQPAILTTSTAIYRLILKEIELRPDMVAGLSLGEYSAIjVASGAIRFEDAWLVARRGQLM 120 

TQPAILT+S AIYR +L ++PDMVAGLSLGEYSALVASGA+ FED + LVA+RG+LM 
Sbjct: 63 TQPAI LTSS IAI YR - VLGLHHVKPDMVAGL SLGEYSALVASGALS FEDTLSLVAKRGRLM 121 

Query: 121 FAAAPAGSGKIWAVI^ADRQIIEDACKKASQFGIVSPANYNTPKQIVIGGESIAOTAAVE 180 

E AAP GSGKMVAV+N D Q-S-IE+ C+ A++ G+V+PANYNTP QIVIGG++ AVN AVE 
Sbjct: 122 EEMPQGSGKIWAVMNTDVQVIEEVCQIAAKHGWAPANYNTPSQIVIGGQTDAVHVAVE 181 

Query: 131 ELKQQGVKRLIPLNVSGPFHTALLKPASQKLSDVLDKVHFSVSEIPVIGNTEAQIMKKDD 240 

LK++GVKRLIPLNVSGPFHTALL+PAS+ L+ L++ +FS +IP++GNTEA IM+KD 
Sbjct: 182 LLlffiRGVl^LIPI^SGPFHTALLEPASRLlAKIlLERYNFSDFKIPLVGOT'FAlJIMEKDR 241 

Query: 241 IKSLLARQVMEPTOFDESIETMKKKGMTQVVEIGPGICVLSGFLKKIDSSLSVHSVEDKI 299 

I LLARQVMEPVRF +S+ T+ + G+TQ +E+GPGKVL+GF+KKID +L SVE+ + 
Sbjct: 242 IPELLARQVMEPWFYDSVATIjVESGITQFIEVGPGKVLTGFVKKIDBCNLLCTSVENMV 300 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1675 

A DNA sequence (GBSxl779) was identified in S.agalactiae <SEQ ID 5197> which encodes the amino 
acid sequence <SEQ ID 5198>. This protein is predicted to be beta-ketoacyl-ACP reductase (fabG). 
Analysis of this protein sequence reveals the following: 
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3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0930 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98275 GB:AF197933 beta-ketoacyl-ACP reductase [Streptococcus pneumoniae] 
Identities = 184/243 (75%) , Positives = 212/243 (86%) 

MQLKDKNIFITGSSRGIGLAIAHQFAQLGANlVIiNGRSEISBDLlAEFADYGVKVIAISG 60 
M+L+ KNIFITGSSRGIGLAIAH+FAQ GANIVLN R ISE+L+AEF++YG+KV4 ISG 
MKLEHKNIFITGSSRGIGLAIAHKFAQAGANIVLNSRGAISEELLAEFSNYGIKWPISG SO 

DVSSFEDANRMIKFJilASLGSVDVLVNNAGITNDKLMLKMTV^DFESVLKI^TGAFNMT 120 
DVS F DA RMI +AIA LGSVDVLVNNAGIT D LMLKMT DFE VLK+NLTGAFNMT 



QSVLKPM KAR+GAIIN+SSWGL GN+GQANYAASKAGLIGFTKSVAREVA+R IRVN 



IAPG IESDMT ++ +K++EA IiAQIPMK G+ ++VA + FLA Q+YLTGQV+AIDGG 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3865> which encodes the amino acid 
sequence <SEQ ID 3866>. Analysis of this protein sequence reveals the following: 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 


Sbj ct : 


241 



Final Results 

bacterial cytoplasm Certainty=0. 1088 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/244 (82%) , Positives = 220/244 (89%) 

MQLKDKNIFITGSSRGIGIiAIAHQFAQLGANIVLNGRSEISEDLIAEFADYGVKVIAISG ' 60 
M4+K KNIFITGS+RGIGLA+AHQFA L ANIVLNGRS ISE+L+A F DYGV V+ ISG 
MEIKGKNIFITGSTRGIGLAMAHQFASLFANIVLNGRSAISEELVASFTDYGVTVVTISG 60 

DVSSFEDANRMIKFJUASLGSVDVTjVNN&GITISroKL^ 12 0 

DVS +A RM+ EAI SLGS+DVLVNNAGITNDKLMLKMT EDFE VLKINLTGAFNMT 



QSVLKPM KARQGAIIN+SSWGLTGN-i-GQANYAASKAG+IGFTKSVAREVAAR I VNA 







Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 






Sbjct: 


181 




241 


Sbjct: 


241 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1676 

A DNA sequence (GBSxl780) was identified in S.agalactiae <SEQ ID 5199> which encodes the amino 
acid sequence <SEQ ID 520O. This protein is predicted to be 3-oxoacyl-(acyl-carrier-protein) synthase II 
(fabF). Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 338 - 354 ( 338 - 354) 

Final Results 

bacterial membrane --- Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>SP:AAF98276 GB:AF197933 beta-ketoacyl-ACP synthase II 
[Streptococcus pneumoniae] 
Identities = 340/410 (82%), Positives = 375/410 (90%) 

Query: 1 MTLQRWVTGYGVTSPIGNTPEEFWNSLKEGNVGIGPITKFDSSDFMVKNAAEIHDFPFD 60 

M L RWVTGYGVTSPIGNTPEEFWNSL G +GIG ITKFD SDF V NAAEI DFPFD 
Sbjct: 1 MIO^VVWGYGvTSPIGNTPEEFraSLATGKIGIGGITKFDHSDFDVHNAAEIQDFPFD 60 

Query: 61 KYFVKKDIJTOFDMYSLYALYASSEAIQHANLNLDEIDADRFGVIVASGIGGIQEIEEQVI 120 

KYFVKKD NRFD YSLYALYA+ EA+ HANL+++ ++ DRFGVIVASGIGGI+EIE+QV+ 
Sbjct: 61 KYFVKKDTNRFDNYSLYALYAAQEAWHANLDVEALNRDaFGVIVASGIGGIKEIEDQVL 120 

Query: 121 RLHEKGPKKVKPMTLPKALPimaAGIWAMRLGAHGVCKSINTACASSNDAIGDAFRNIKF 180 

RLHEKGPKRVKPMTLPKALPNMA+GNVAMR GA+GVCKSINTAC+SSNDAIGDAFR+IKF 
Sbjct: 121 RLHEKGPKRVKPMTLPKALPNMASGNVAMRFGANGVCKSINTACSSSNDAIGDAFRSIKF 180 

Query: 181 GIQDIMWGGAEAAITKFAIAGFQSLTALSTTEDPSRASIPFDKDRNGFIMGEGSGMLVL 240 

G QD+M+VGG EA+IT FAIAGFQ+LTALSTTEDP+RASIPFDKDRNGF+MGEGSGMLVL 
Sbjct: 181 GFQDVMLVGGTEASITPFAIAGFQALTALSTTEDPTRASIPFDKDRNGFVMGEGSGMLVL 240 



Query: 301 NAHGTSTPANEKGESQAIVAALGTDVPVSSTKSFTGHLLGAAGAVEAIATIEAIRHSYVP 360 

NAHGTSTPANEKGES AIVA LG +VPVSSTKSFTGHjLGAAGAVEAI TIEA+RH++VP 
Sbjct: 301 NAHGTSTPANEKGESGAIVAVLGKEVPVSSTKSFTGHLLGAAGAVEAIVTIEAMRHNFVP 3 60 

Query: 361 MTAGTTELSEDITANVIFGQGQDADIRYAISNTFGFGGHNAVLAFKRWED 410 

MTAGT+E+S+ I ANV4+GQG + +1 YAISNTFGFGGHNAVLAFKRWE+ 
Sbjct: 361 MAGTSEVSDYIEaWVYGQGLEKEIPYAISNTFGFGGHNAVLAFKRWEN 410 

A related DNA sequence was identified in S.pyogenes <SEQ ID 385 1> which encodes the 8 
sequence <SEQ ID 3852>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 0890 (Affirmative) - 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0.0000 (Not Clear) < i 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 346/410 (84%), Positives = 377/410 (91%) 

Query: 1 MTLQRVVVTGYGVTSPIGOTPEEFWNSLKEGWGIGPITKFDSSDFMVKrJAAEIHDFPFD 60 

MT +RVWTGYG+TSPIG+ PE FWN+LK G +GIGPITKFD++D+ VKHAAEI DFPFD 
Sbjct: 1 MTFKRWVTGYGLTSPIGHDPETFWNNLKaGQIGIGPITKFDTTDYAVKNAAEIQDFPFD 60 

Query: 61 KYFVKKDLNRFDMYSLYALYASSEAIQHANLNLDEIDADRFGVIVASGIGGIQEIEEQVI 120 

KYFVKKDLNRFD YSLYALYA+ EAI HA+LN++ +D+DRFGVIVASGIGGI EIEEQVI 
Sbjct: 61 K^FVKKDLI^FDRYSLYALYAAKEAINHADIjNIEI^VDSDRFGVIVASGIGGIAEIEEQVI 120 

Query: 121 RLHEKGPKRVKPMTLPKALPNMAAGNVAN1RLGAHGVCKSINTACASSNDAIGDAFRNIKF 180 

RLHEKGPKRVKPMTLPKALPNMAAGNVAM l a gvcksintacassndaigdafr ikf 
Sbjct: 121 RLHEKGPKRWPMTLPKALPmAAGNVANlSLKAQGVCKSIMTACASSNDAIGDAFR&IKF 180 

Query: 181 GIQDIMWGGAEAAITKFA1AGFQSLTALSTTEDPSRASIPFDKDRNGFIMGEGSGMLVL 240 

G QD+M+VGG+EAAITKFAIAGFQSLTALSTTEDPSR+SIPFDKDRNGFIMGEGSGMLVL 
Sbjct: 181 GTQDVMIVGGSEAAITKFAIAGFQSLTA1STTEDP3RSSIPFDKDRNGFIMGEGSGMLVL 240 



Sbjct: 

Query: 301 NAHGTSTPANEKGESQAIVAALGTDVPVSSTKSFTGHLLGAftGAVEAIATIEAIRHSYVP 360 

NAHGTSTPANEKGESQAIVA LG DVPVSSTKSFTGHLLGAAGA+EAIATIEA+RH+YVP 
Sbjct: 301 NAHGTSTPANEKGESQAIVAVLGJCDVPVSSTKSFTGHLLGMGAIEAIATIEAMRHNYVP 360 

Query: 361 MTAGTTELSEDITANVIFGQGQDADIRYAISNTFGFGGHNAVIAFKRWED 410 

MTAGT LSEDI ANVIFG+G++ I YAISNTFGFGGHNAVIAFK WE+ 
Sbjct: 361 MTAGTQALSEDIFANVIFGEGlffiTAIMAISNTFGFGGHl^KVIAFKCWEE 410 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1677 

A DNA sequence (GBSxl781) was identified in S.agalactiae <SEQ ID 5201> which encodes the amino 
acid sequence <SEQ ID 5202>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3052 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9553> which encodes amino acid sequence <SEQ ID 9554> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98277 GB:AF197933 biotin carboxyl carrier protein 
[Streptococcus pneumoniae] 
Identities = 103/169 (60%), Positives = 127/1S9 (74%), Gaps = 11/169 (6%) 

Query: 19 LDIQEIKDLMTQFDESSLREFSFICrSDGELSFSKSEGKRPLVP'TMSPMSHQPEATPTIAT 78 

+++ +IKDLMTQFD+SSLREFS+K EL FSKNE + VP ++ Q p +AT 
Sbjct: 1 MNIJSIDIKDLMTQFDQSSLREFSYKNGTDELQFSKNEARP- -VPEVAT QVAPAPVLAT 55 

Query: 79 PVSNEAGEQTKQATEWSEIP ESTVTVAEGDWESPLVGVAYLASGPDKPNFVSVGD 135 

P + + A V E+P E++V EG++VESPLVGV YIA+GPDKP FV+VGD 

Sbjct: 56 P--SPVAPTSAPAETVAEEVPAPAEASVAT-EGNLTOSPLVGVVYLAAGPDKPAFVTVGD 112 
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Query: 136 SVKKGQTLMIIEAMKUMNEVPAPHDGWTEILVMJEEVIEFGKGLWIK 184 

SVKKGQTL+ 1 IEAMKVMNE+PAP DGVVTEILV+NEE++EFGKGLVRIK 
Sbjct: 113 S VKKGQTLVI IEAMKVMNE I PAPKDGWTE I LVSNEEMVE FGKGLVRI K 161 



5 A related DNA sequence was identified in S.pyogenes <SEQ ID 5203> which encodes the amino acid 
sequence <SEQ ID 5204>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm --- Certainty=0 . 3132 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 107/171 (62%), Positives = 126/171 (73%), Gaps = 10/171 (5%) 

Query: 19 LDIQEIKDLMTQFDESSLREFSFKTSDGELSFSKNEGKAPLVPTMSPMSHQPEATPT--- 75 

L+IQEIKDLM QFD SSLREF FKT++GEL FSKNE + S+Q A P 

Sbjct: 1 LNIQE1KDLMAQFDTSSLREFLFKTNEGELIFSKNEQHLN ASTSNQEHAVPVPQV 55 

Query. 76 - - IATPVSI^GEQTKCATEWSEIPESWTVAEGDVVESPLVGVAYLASGPDKPNFVSV 133 

+ P 44EA V E P++ VAEGD+VESPLVGVAYLA+ PDKP FV+V 

Sbjct: 56 QLVPNPTASEASSPASVKDVPVEEQPQAESFVAEGDIVESPLVGVAYLAASPDKPPFVAV 115 

Query: 134 GDSVKKGQTLMIIEAMKVMNEVPAPHDGVVTEILVANEEVIEFGKGIiVRIK 184 

GD+VKKGQTL+ 1 IEAMKVMNEVPAP DGV+TEILV+NE+VIEFG4GLVRIK 
Sbjct: 116 GDTVTCKGQTLVIIEAMKVMNEVPAPCDGVITEIIiVSNEDVIEFGQGLvRIK 166 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1678 

A DNA sequence (GBSxl782) was identified in S.agalactiae <SEQ ID 5205> which encodes the amino 
acid sequence <SEQ ID 5206>. This protein is predicted to be beta-hydroxyacyl-ACP dehydratase (fabZ). 
35 Analysis of this protein sequence reveals the following: 

Possible site: 59 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm — Certainty=0. 2267 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:AAF98278 GB:AF197933 beta-hydroxyacyl-ACP dehydratase 

[Streptococcus pneumoniae] 
Identities = 130/140 (92%) , Positives = 135/140 (95%) 



Query: 1 MIDIKEIREALPHRYPMLLVDRVLEVSEDEIVAIKN'/SINEPFFNGHFPEYPVMPGVLIM 60 

MIDI+ I+EALPHRYPMLLVDRVLEVSED IVAIKNV+INEPFFNGHFP+YPVMPGV+IM 
Sbjct: 1 MIDIQGIKFALPHRYPMLLVDRVljEVSEDTIVAIKNVTINEPFFNGHFPQYPVMPGWIM 60 

Query: 61 EAl^QTAGVLELSKEENKGKLVFYAG^KvlO?KKQvTO 120 

EALAQTAGVLELSK ENKGKLVFYAGMDKVKFKKQWPGDQLVMTA FVKRRGTIAWEA 
Sbjct: 61 EAIJiQTAGVlELSKPENKGKLVFYAG>CiKVKFKKQWPGDQLvMTATFWRRGTIAVVEA 120 



Query: 121 IAEVDGK1AASGTLTFAIGN 140 
AEVDGKLAASGTLTFAIGN 
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Sbjct: 121 KAEVDGKLAASGTLTFAIGN 140 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5207> which encodes the amino acid 
sequence <SEQ ID 5208>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1882 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/139 (91%) , Positives = 133/139 (95%) 

Query: 1 MIDIKEIRFJULPHRYPMLLVDRVLEVSEDEIVAIKNVSINEPFFNGHFPEYPVMPGVLIM 60 

M+DI+EI+ ALPHRYPML1A7DRVLEVS+D IVAIKMV+INEPFFNGHFP YPVMPGVLIM 
Sbjct: 1 MMDIREIQAALPHRYPMLLVDRVLEVSDDHIVAIKNVTINEPFFNGHFPHYPVMPGVLIM 6 0 

Query: 61 FAIAQTAGVLELSKEENKGKLVFYAGMDKVKFKKQWPGDQLvMTAKFVKRRGTIAVVEA 120 

EALAQTAGVLELSKEENKGKLVFYAGMDKVKFKKQWPGDQLVMTA F+KRRGTIAWEA 
Sbjct: 61 FJU^QTAGVLELSKEENKGKLVFYAGMDKVKFKKQWPGDQLVMTATFIKRRGTIAWBA 120 

Query: 121 IAEVDGKLAASGTLTFAIG 139 

AEVDGKLAASGTLTFA G 
Sbjct: 121 RAEVDGKLAASGTLTFACG 139 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1679 

A DNA sequence (GBSxl783) was identified in S.agalactiae <SEQ ID 5209> which encodes the amino 
acid sequence <SEQ ID 5210>. This protein is predicted to be acetyl-coenzyme A carboxylase, biotin 
carboxylase (accC). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1203 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98279 GB:AF197933 acetyl -CoA carboxylase biotin carboxylase 
subunit [Streptococcus pneumoniae] 
Identities = 361/451 (80%) , Positives = 405/451 (89%) 

Query: 1 MFKKILIANRGEIAWIIRAAREMGISTVAIYSEADKESLHTILADEAICVGPAKSAESY 60 

MF+KILIANRGEIAVRIIRAARE+GI+TVA+YS ADKE+LHT+LADEA+C+GP K+ ESY 
Sbjct: 1 MFRKILIANRGEIAWIIRAAREI^IATVAVYSTADKEALHTLLADEAVCIGPGKATESY 60 

Query: 61 LNVNAILSAAIVTGAEAVHPGFGFLSENSKFATMCSEMNLKFIGPSGEVMDroiGDKINAR 120 

LN+NA+LSAA++T AEA+HPGFGPLSENSKFATMCEE+ +KFIGPSG VMD MGDKINAR 
Sbjct: 61 LNimVLSAAVIjTEAEAIHPGFGFl^ENSKFATMCEEVGIKFIGPSGHVMDMMGDKINAR 120 

Query: 121 TEMIKADVPVIPGSDGQVTSVEEAVSIAEEIGYPLMLKASAGGGGKGIRKVKSADELKPA 180 

+MIKA VPVIPGSDG+V + EEA+ +AE+1GYP+MLKASAGGGGKGIRKV+ D+L A 
Sbjct: 121 AQMIKAGVPVIPGSDGEVHNSEFALrVAEKIGYPVMLKaSAGGGGKGIRKVEKPDDLVSA 180 
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Query: 181 FESASQEALAAFGN6AMYIEKVIYPARHIEVQILGDSFGKIVHLGERDCSLQRNNQKVLE 240 

FE+AS EA A +GNGAMYIE4VIYPARHIEVQILGD G ++HLGERDCSLQRNNQKVLE 
Sbjct: 181 FETASSEAKAOTGNGAMYIERVIYPARHIEVQILGDEHGHVIHLGERDCSLQRNNQKVLE 240 

Query: 241 ESPSVAIGOT'LRQQIGEAAVEAAEAVSTENAGTIEFLLDENSGQFyFMEMNTRVQVEHPV 300 

ESPS+AIG TLR +IG ARVRAAE V YENAGTIEFLLDE S FYFMEMNTRVQVEHPV 
Sbjct: 241 ESPSIAIGKTLRHEIGAAAVRAREFVGYENAGTIEFLLDEASSNFYFMEMNTRVQVEHPV 300 

' Query: 301, TEFVTGVDIVKEQIRIAAGIPLSVSQNDIJCLTGHAIECRIIJAENPQFNFAPCPGTINGLH 360 
TEFV+GVDIVKEQI IAAG PLSV Q DI L GHAIECRINAENP FNFAP PG I L+ 
Sbjct: 301 TEFVSGVD1VKEQICIAAGQPLSVKQEDIVLRGHAIECRINAENPAFNFAPSPGKITNLY 360 

Query: 361 LPAGGMGLRVDSAVYTGYTIPPYYDSMIAKVIVHGENRFDALMKMQRALYELEIDGIVTN 420 

LP+GG+GLRVDSAVY GYTIPPYYDSMIAK+IVHGENRFDALMKMQRALYELEI+G+ TN 
Sbjct: 361 LPSGGVGLRVDSAVYPGYTIPPYYDSMIAKIIVHC-ENRFDALMKMQRALYELEIEGVQTN 420 

Query: 421 TEFQMDLISDKKVLAGDYDTSFLMEDFLPRY 451 

+FQ+DLISD+ V+AGDYDTS FLME FLP+Y 
Sbjct: 421 ADFQLDLISDRNVIAGDYDTSFLMETFLPKY 451 

A related DNA sequence was identified in S. pyogenes <SEQ ID 521 1> which encodes the amino acid 
sequence <SEQ ID 5212>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1784 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 369/451 (81%), Positives = 421/451 (92%) 

Query: 1 MFKKILIANRGEIAVRIIRAAREMGISTVAIYSEADKESLHTILADFAICVGPAKSAESY 60 

MFKKILIANRGEIAVRIIRAARE+GISTVA+YSEADKE+LHTILADFAIC+GPA+S ESY 
Sbjct: 17 MFKKILIANRGEIAVRIIRAARELGISTVAVYSEADKEALHTILADEAICIGPARSKESY 76 



Query: 121 TEMIKADVPVIPGSDGQVTSVEEAVSIAEEIGYPLMLKASAGGGGKGIRKVKSADELKPA 180 

+EMIKA VPVIPGSDG+V + +EA++IA + IGYP+MLKASAGGGGKGIRKV++ +L+ A 
Sbjct: 137 SEMIKAGVPVIPGSDGEVYNAQEALAIANKIGYPVMLICASAGGGGKGIRICVETEADLEAa 196 

Query: 181 FESASQFAIAAFGNGAMYIEKVIYPARHIEVQILGDSFGKIVHLGERDCSLQRNNQKVLE 240 

F +ASQEAL AFGNGAMY+EKVI YPARHIEVQILGD+ +G I +HLGERDCSLQRNNQKVLE 
Sbjct: 197 FNAASQEALGAFGNGAIvraEKVIYPARHISVQILGDAYGNIIHLGERDCSLQRNNQKVLE 256 

Query: 241 ESPSVAIGNTLRQQIGEAaVRaAEAVSYENAGTIEFLLDENSGQFYFMEMNTRVQVEHPV 300 

ESPS+AIGNTLR ++G+AAVRAAEAV+ YEMAGT IEFLLDE+S +FYFMEMNTR+QVEHPV 
Sbjct: 257 ESPSIAIGNTLRHEMGQAftVRAftEAVAYENAGTIEFLLDEDSEKFYFMEMNTRIQVEHPV 316 

Query: 301 TEFVTGVDIVKEQIRIAAGIPLSVSQNDIKLTGHAIECRINAENPQFNFAPCPGTINGLH 360 

TEFVTGVDIVKEQI+IAAG PL+++Q DI +TGHAIECRINAEN FNFAP PG I L+ 
Sbjct: 317 TEFVTGVDIVKEQIKIARGQPLAINQEDITITGHAIECRINAENTAFNFAPSPGKITDLY 376 

Query: 361 LPAGGMGLRVDSAVYTGYTIPPYYDSMIAKVIvHGENRFDALMKMQRMjYELEIDGIVTN 420 

+P+GG+GLRVDSAVY GY IPPYYDSMIAK+IVHG NRFDALMKMQRAL ELEI+GI+TN 
Sbjct: 377 MPSGGVGLRVDSAVYNGYAIPPYYDSMIAKIIVHGSNRFDALMKMQRALVELEIEGIITN 436 

Query: 421 TEFQMDLISDKKVIAGDYDTSFLMEDFLPRY 451 

T+ FQ+DLI SDK+ V+AGDYDTS FLME FLP Y 
Sbjct: 437 TDFQLDLISDKRVIAGDYDTSFLMETFIiPHY 467 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1680 

A DNA sequence (GBSxl784) was identified in S.agalactiae <SEQ ID 5213> which encodes the amino 
acid sequence <SEQ ID 5214>. This protein is predicted to be acetyl-CoA carboxylase beta subunit (accD). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



10 Final Results 

bacterial cytoplasm Certainty=0. 3571 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98280 GB:AF197933 acetyl-CoA carboxylase beta subunit 
[Streptococcus pneumoniae] 
Identities = 221/285 (77%) , Positives = 248/285 (86%) , Gaps = 1/285 (0%) 

MALFSKKDKYIRISPNKALGSSDKRSLPEVPDELFAKCPSCKHMIYQKDLGLAKICPACS 60 
MALFSKKDKYIRI+PN+++ + PEVPDELF++CP CKH IYQKDLG +ICP CS 
MALFSKKDKYIRINPNRSVREKPQAK-PEVPDELFSQCPGCKHTIYQKDLGSERICPHCS 59 

YNFRISAQERDLLTVDEDSFEELFTGIETKDPLNFPNYREKIAATROKTNLDEAVVTGLA 12 0 
Y FRISAQERL LT+D +F+ELFTGIE+KDPL+FP Y++KLA+ R+KT L EAWTG A 



IKGQT AL IMDS+FIMASMGTWGEK+TRLFE AT +KLP+V+FTASGGARMQEGIMS 



LMQMAK+SAAVKRHSN GLFYLTILTDPTTGGVTASFAMEGDIILAEPQ+LVGFAGRRVI 



E TVRE LPE FQKAEFI1L1EHGFVDAI+ R +L D IA L+ HG 
ETnVRESLPEDFQKAEFLLEHGFVDAIVKRRDLPDTIASLVRIiHG 284 

A related DNA sequence was identified in S.pyogems <SEQ ID 5215> which encodes the amino acid 
sequence <SEQ ID 5216>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4092 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/285 (81%) , Positives = 253/285 (88%) 

Query: 1 MALFSKKDKYIRISPNKALGSSDKRSLPEVPDELFAKCPSCKHMIYQKDLGLAKICPACS 60 

MALF KKDKYIRI+PN +L S -+PEVPDELFAKCP+CKRMIY+KDLGLAKICP CS 
Sbjct: 1 MALFRKKDKYIRITPNNSLKGSVSHNVPEVPDELFAKCPACKHMIYKKDLGLAKICPTCS 60 
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Sb j ct : 




Query: 


61 


Sbjct: 


60 




121 


Sbjct: 


120 




181 


Sbjct: 


180 


Query: 


241 


Sbjct: 


240 



Query: 61 YNFRISAQERLLLTVDEDSFEELFTGIETOPLNFPNYREKIAATRQKTNLDEAVVTGLA 120 
YNFRI SAQERL LTVDE SF+ELFT IETKDPL FP Y+EKL ++ T L EAV+TG A 
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Sbjct: 61 ■mFRISAQERLTLTVDEGSFQELFTSIETKDPLRFPGYQEKLQKAKETTGLHEAVLTGKA 120 

Query: 121 KIKGQTTALAIMDSHFIMASMGWGEKLTRLFELATEKKLPIVIFTASGGARMQEGIMS 180 

+K Q ALAIMDSHF IMASMGTWGEK+TRLFELA E+ LP+VIFTASGGARMQEGIMS 
Sbjct: 121 WKEQKIAIAIMDSHFIMASMGTO'GEKITRLFEIAIEENLPVVIFTASGGARMQEGIMS 180 

Query: 181 LMQMAKVSAAVKRHSNQGLFYLTILTDPTTGGVTASFAMEGDIILAEPQALVGFAGRRVI 240 

LMQMAKVSAAVKRHSN GLFYLTILTDPTTGGVTASFAMEGDIILAEPQ+LVGFAGRRVI 
Sbjct: 181 LMQMAKVSAAVKRHSNAGLFYLT1LTDPTTGGVTASFAMEGDIILAEPQSLVGFAGRRV1 240 

Query: 241 ETTVREDLPEGFQKAEFLLEHGFVDAII.MRTELRDCIAQLIAFHG 285 

ETTVRE+BP+ FQKAEFL +HGFVDAI+ RTELRD IA L+AFHG 
Sbjct: 241 ETTVRENLPDDFQKAEFLQDHGFVDAIVKRTELRDKIAHLVAFHG 285 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1681 

A DNA sequence (GBSxl785) was identified in S.agalactiae <SEQ ID 5217> which encodes the amino 
acid sequence <SEQ ID 5218>. This protein is predicted to be acetyl-CoA carboxylase alpha subunit 
(accA). Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no Nrterminal signal sequence 

INTEGRAL Likelihood = -1.22 Transmembrane 149 - 165 ( 149 - 165) 

Final Results 

bacterial membrane — Certainty=0. 1489 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < euco 

A related GBS nucleic acid sequence <SEQ ID 9555> which encodes amino acid sequence <SEQ ID 9556> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98281 GB:AF197933 acetyl-CoA carboxylase alpha subunit 
[Streptococcus pneumoniae] 
Identities = 186/254 (73%), Positives = 222/254 (87%) 

Query: 13 DVTRILKDARDQGRLTALDYAELIFDNFMELHGDRQFADDKSIIGGLGYLAGRPVTIVGI 72 

++ +I+++AR+Q RLT LD+A IFD F++LHGDR F DD 44+GG+G+L + VT+VGI 
Sbjct: 2 NIAKIVREAREQSRLTTLDFATGIFDEFIQLHGDR3FRDEGAVVGGIGWLGDQAVTWGI SI 

Query: 73 QKGKNLQDNLDRHFGQPHPEGYRJCALRLMKQAEKFGRPVITFINTAGAYPGVGAEERGQG 132 

QKGK+LQDNL R+FGQPHPEGYRKALRLMKQAEKFGRPV+TFINTAGAYPGVGAEERGQG 
Sbjct: 52 QKGKSLQDNLIOiNFGQPHPEGYRKALRLMKQAEKFGRPvVTFINTAGAYPGVGAEERGQG 121 

Query: 133 EAIARNLLEMSDLKVPIIAIIIGEGGSGGALALAVADKVWMLEHTVYSILSPEGFASILW 192 
FAIARNL4-EMSDLKVPIIAIIIGEGGSGGALAIAVAD+VWMLE+++Y+ILSEEGFASILW 
■ Sbjct: 122 EAIARNLMEMSDLKVPI IAI I IGEGGSGGALALAVADRVWMLENSIYAILSPEGFASILW 181 

Query: 193 KDGTRTTEAAQLMKMTAGELYHMEWDKVIPEHGYFSSEITOMIKTSLISELEVLSQLSL 252 

KDGTR EAA+LMK+T+ EL M+WDKVI EG S E++ +K L +EL LSQ L 
Sbjct: 182 KDGTRAMEAAELMKITSHELLEMDVVDKVISEIGLSSKELIKSVKKELQTELARLSQKPL 241 

Query: 253 EDLLEQRYQRFRKY 266 

E+LLE+RYQRFRKY 
Sbjct: 242 EELLEERYQRFRKY 255 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5219> which encodes the amino acid 
sequence <SEQ ID 5220>. Analysis of this protein sequence reveals the following: 
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Possible site: 61 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.22 Transmembrane 139 - 155 ( 139 - 155) 

5 Final Results 

bacterial membrane Certainty=0. 1489 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:AAF98281 GB:AF197933 acetyl-CoA carboxylase alpha subunit 
[Streptococcus pneumoniae] 
Identities = 189/254 (74%) , Positives = 225/254 (88%) 

DVSRILKEARDQGRLTTLDyANLIFDDFMELHGDRHFSDDGAIVGGLAYLAGQPVTVICSI 62 
++++I++EAR+Q RLTTLD+A IFD+F++LHGDR F DDGA+VGG+ +L Q VTV+GI 
NIAKIWEAREQSRLTTLDFATGIFDEFIQLHGDRSFRDDGAWGGIGWLGDQAVTWGI 61 



QKGK4LQDNL RNFGQP+PEGYRKALRLMKQAEKFGRPWTFINTAGAYPGVGAEERGQG 



EAIA+NLMEMSDLKVPI IAI I IGEGGSGGALALAVAD+VWMLEN+ +YA+LSPEGFAS ILW 



KDG+RA EAAELMKIT+ EL +M +VD++I E G S E++ +K L 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 204/254 (80%) , Positives = 236/254 (92%) 

Query: 13 DVTRILKDARDQGRLTALDYAELIFDNFMELHGDRQFADDKSIIGGLGYLAGRPVTIVGI 72 

DV+RI LK+ARDQGRLT LDYA LIFD+FMELHGDR F+DD +I+GGL YLAG+PVT++GI 
Sbjct: 3 DVSRILKEARDQGRLTTLDYANLIFDDFMELHGDRHFSDDGAIVGGLAYLAGQPVTVIGI 52 

Query: 73 QKGKNLQDNLDRHFGQPHPEGYRKALRLMKQAEKFGRPVITFINTAGAYPGVGAEERGQG 132 

QKGKNLQDNL R+FGQP+PEGYRKALRLNKQAEKFGRPV+TFINTAGAYPGVGAEERGQG 
Sbjct: 63 QKGKNLQDNLARNFGQPNPEGYRK^RLMKQAEKFGRPVVTFINTAGAYPGVGAEERGQG 122 

Query: 133 E^IARNLLEMSDLKVPIIAIIIGEGGSGGALAIAVADKWMLEHTVYSILSPEGFASILW 192 

EAIA+NL+EMSDLKVPI IAI I IGEGGSGGALALAVAD-)- VWMLE+T+Y++LSPEGFASILW 
Sbjct: 123 EAIAKNLMEMSDLKVPI IAI I IGEGGSGGALAIAVADQVWMLENTMYAVLSPEGFAS ILW 182 

Query: 193 KDGTRTTEARQLMKMTAGELYHMEWDKVIPEHGYFSSEIVDMIKTSLISELEVLSQLSL 252 

KDG+R TEAA+LMK+TAGELY M +VD++IPEHGYFSSEIVD+IK +LI ++ L L 
Sbjct: 183 KDGSRATEAAELMKITAGELYKMGIVDRIIPEHGYFSSEIVDIIKANLIEQITSLQAKPL 242 

Query: 253 EDLLEQRYQRFRKY 266 

+ LL++RYQRFRKY 
Sbjct: 243 DQLLDERYQRFRKY 256 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1682 

A DNA sequence (GBSxl786) was identified in S.agalactiae <SEQ ID 5221> which encodes the amino 
acid sequence <SEQ ID 5222>. This protein is predicted to be sakacin A production response regulator. 
Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3304 (Affirmative) < suco 

10 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9557> which encodes amino acid sequence <SEQ ID 9558> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA88824 GB:AB016077 sakacin A production response regulator 
[Streptococcus mutans] 
Identities = 76/142 (53%), Positives = 99/142 (69%) 

20 . Query: 36 MQTFKAKGQWU^SFTELSRALEQRMMFK^IQRVSNWANQAQVGRPHFWVYYRKDTDQLD 95 

M K GQ AR FTE+++ L ++ F4M RVSNWANQAQV RPHFW YY++ D D 
Sbjct: 1 MIALKTLGQSARAEFTElAKVLALKVSPFENtMRVSNWANQAQVVRPHFWCYYKQPEDNQD 60 

Query: 96 DVAVALRWGVKDSFGVSLEVSFVEI^KSDKTLEKQARV1.SIPIASPLYFMVQRQGETHR 155 
25 DV +A+R+YG +FG+S+EVSF+ER+KS TL KQ +VL IPIA PLY+ Q + E+HR 

Sbjct: 51 DVGLAIRLYGNSANFGISVEVSFIEFIKKSKATLAKQHKVLDIPIAEPLYYFAQEKSESHR 120 

Query: 156 EEGNEENRQRLMQEIKSGKVRK 177 
G E RQ L Q++ G+VRK 
30 Sbjct: 121 VSGTEAYRQMLRQKVADGQVRK 142 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1683 

A DNA sequence (GBSxl787) was identified in S.agalactiae <SEQ ID 5223> which encodes the amino 
acid sequence <SEQ ID 5224>. This protein is predicted to be seryl-tRNA synthetase (serS). Analysis of 
this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 . 1866 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11789 GB:Z99104 seryl-tRNA synthetase [Bacillus subtilis] 
Identities = 262/425 (61%) , Positives = 322/425 (75%) , Gaps = 1/425 (0%) 

Query: 1 MLDLKRIRTDFDWAKKLATRGVDQETLTTIiKELDI KRRELLI KAEEAKAQRNVASAAI A 60 

MLD K +R +F + KL +G D + IiD +RREL+ K EE K +RN S +A 

Sbjct: 1 MLDTKMLPJ^FQElKAKLVHKGEDLTDFDKFEALDDRRRELIGKvEELKGKRNEVSQQVA 60 
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KR K++AD I M+ + +IK +D KL V+A L + 



MLD H E YTEVI PPYMVN SM GTGQ PKF+ED F++ + + LIPTAEVP+TN + 



RDEI+ G LPI + A S FRSEAGSAGRDTRGLIR HQF+KVE+VKF KPE+SY+ELE 



K+T AE +LQ L LPYRV+++CTGD+GF+AAK YD+EVWIP+Q+TYREISSCSN E FQ 



ARRA IR+R E GK +HTLNGSGLAVGRTVAAILENYQ EDGSV IP+VLRPYMGN 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5225> which encodes the amino acid 
sequence <SEQ ID 5226>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2453 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 357/424 (84%) , Positives = 386/424 (90%) 

MLDLKRIRTDFDWAKKLATRGVDQETLTT1.KELDIKRRELLIKAEERKAQRNVASAAIA 60 
MLDLKRIRTDFD VA KL RGV ++TLT LKELD KRR LL+++EE KA+RN+ASAAIA 
MLDLKRIRTDFDTVAAKLKNRGVSEDTL'THLKELaEKRRALLVQSEELKAERNIASAAIA 6 0 

QAKl^KENADEQIAAMQTLSADIKAIDAELADVDANlQSmmrtjPNTPADDVPLGADEDE 12 0 
QAKR KE+A +QIA MQ +SADIK ID +L +D + ++TVLPNTP D VP+GADE++ 



MLDEH KEGY E+I PYMVNHDSMFGTGQYPKFKEDTFELAD+ FVLIPTAEVPLTNYYR 



EI+DGKELPIYFTAI'ISPSFRSEAGSAGRDTRGLIRLHQFHKVEMVKFAKPEESYQELEK 



MTANAENILQKL LPYRVI+LCTGDMGFSAAKTYDLEWIPAQiYTYREISSCSNTEDFQA 





1 . 


Sbj ct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 
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Query: 361 RRAQIRYRDEVDGKVRLLHTLl^GSC i ZI GI I J'.M jSNYQNEDGSVTIPEVLRFSMGHID 420 
RRAQIRYRDE DGKV+LLHTLHGSGLAVGRTVAAIIjENYQNEDGSVTIPEVLRPYMG 

Sbjct: : 

Query: 421IIKE 424 
+1 P 

Sbjct: 421 VISP 424 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1684 

A DNA sequence (GBSxl788) was identified in S.agalactiae <SEQ ID 5227> which encodes tl 
acid sequence <SEQ ID 5228>. Analysis of this protein sequence reveals the following: 

Possible site: 36 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood =-11 


36 


Transmembrane 


313 


329 


306 


332 


INTEGRAL 


Likelihood = -9 


24 


Transmembrane 


159 


175 


155 


179 


INTEGRAL 


Likelihood = -4 


19 


Transmembrane 


20 


36 


16 


37 


INTEGRAL 


Likelihood = -3 


29 


Transmembrane 




287 


271 


287 


INTEGRAL 


Likelihood = -2 


97 


Transmembrane 


210 


226 


209 


227 


INTEGRAL 


Likelihood = -2 


87 


Transmembrane 


242 


258 


241 


258 


INTEGRAL 


Likelihood = -2 


13 


Transmembrane 


52 




50 


68 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■- Certainty=0. 5543 (Affirmative) ■ 
— certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 (Not Clear) < t 



A related GBS nucleic acid sequence <SEQ ID 9559> which encodes amino acid sequence <SEQ ID 9560> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA07406 GB:AJ006986 transmembrane protein [Streptococcus pneumoniae] 
Identities = 72/330 (21%), Positives = 143/330 (42%), Gaps = 32/330 (9%) 

Query : 14 

Sbjct: 5 



RHYGLDLLRIISMFMIVITHVLGKGGLRSSVEGHADSYFIVTWIIQVLVYGAvNCYALIS 73 
R+ LDLL++++ +V+ H GG + + + +Y + ++ VN Y L+ 

RNINLDLLKVLACVG WLLHTT - .MGGFKETGAWNFLTYLYYLGTYS I PLFFMVNGYLLL - 62 



Query: 74 GYVGIN SRYRYSKLLS IWAQVFFYTFT I TALFAITGHE VTLLNWRDAFFPIVSG 127 

G I Y K+ + V +TF I LF E + L + FF 
Sbjct: 63 GKREITYSYILQKIKWLLITVS8WTF-IVWLFKRDFTENLIKKIIGSLIQKGYFF 116 

Query: 128 QYWYITAYFGLLVFMPVINNGLNALTDKQLKQLVLLMFI--IFSILPAVLNNRVPEFSLS 185 

Q+W+ A + + +P++ LN+ L L LLM I IF + +L + + + 

Sbjct: 117 QFWFFGALILIYLCLPILRQFLNS-KRSYLYSDSLLMTIGLIFELSNILLQMPIQTYVIQ 175 

Query: 186 KGFEMTWLLILYIIGAYLKRIDL NIFKTSYLLIIYLLSLVATYAMKFSVGDIW- - - 238 

TW Y++G Y+ + 4- + FK ++ LL L++ + F 1 + 
Sbjct: 176 TFRLWTW-FFYYLLGGYIAQFTIEEIESRFKNWMKIVSILLLLISPIILFFIAKTIYHNL 234 

Query: 239 YWYVSPTLTLGAVSLFILFARASIKPSGFLKKIIVVIiAPSTLGVYLCHLHPLIVKYF 295 

Y+Y + + + + +F+ ++ + ++ IV L+ T+GV++ +H I+K + 

Sbjct: 235 FAEYFYDTLFVKVSTLGIFLTILMLTLNEN- -RRESIVSLSNQTMGVFI - - IHTYIMKVW 290 

Query: 296 VRDFAETFVYESIYLYPFLILGAGILIYLL 325 

+ FV + F + +■ I++ +L 

Sbjct: 291 EKvLGFNFVGAYLLFALFTLSVSFIIVGML 320 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1685 

A DNA sequence (GBSxl789) was identified in S.agalactiae <SEQ ID 5229> which encodes the amino 
acid sequence <SEQ ID 5230>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2752 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9561> which encodes amino acid sequence <SEQ ID 9562> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 ^QSLNKTVEFQTTGVSYLGMGNKVGKFLVGDQALEFYITOKNVM)YIQIPWTSINQIGAN SO 

MAQSLNKTVE TTGVSY+ +G KVGKFL+GD ALEFY D NV YIQIPWTSI QIGAN 
Sbjct: 1 ^QSI^KTVEIiHTTGVSYMMGGKVGKFLIGDVAliEFyPDvNvEQYIQIPWTEITQIGAH 60 

Query: 61 VSRKKISRHFEVFTDQGKFLFASKDSGTILKHARRHIGDDKWKLPTLIQTI 112 

VS K+ISRHFEV TD+ KFLFASKDSG ILK AR H+G++KWKLPTLIQTI 
Sbjct: 61 VSGKRISRHFEVLTDKSKFLFASKDSGKILKIAREHLGNEKWKIiPTLIQTI 112 

A related DNA sequence was identified in S.pyogenes <SEQ ID 523 1> which encodes the amino acid 
sequence <SEQ ID 5232>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 . 3301 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/116 (75%) , Positives = 101/116 (87%) 

Query: 1 MAQSLNKTVEFQTTGVSYLGMGNKVGKFLVGDQALEFYNDKNVNDYIQI PWTS INQIGAN 60 

MAQSLN +VE++T VSYLGMG KVG L+GD+ALEFYNDKNVNDYIQIPWT+IN IGAN 
Sbjct: 1 mQSIMTSVEYKTKAVSYLGMGGKVGHILLGDKALEFYNDKimiDYIQIPWTAINHIGAN 60 

Query: 61 VSRKKISRHFEVFTDQGKFLFASKDSGTILKHARRHIGDDKWKLPTLIQTILKIF 116 

VSRKK+SRHFE+FTDQGKFLFAS DSG ILK R+HIG4+KV+ LPTL+QT + F 
Sbjct: 61 VSRKKVSRHFEIFTDQGKFLFASGDSGKILKITRQHIGNEKVITLPTLMQTFINKF 116 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 
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Example 1686 

A DNA sequence (GBSxl790) was identified in S.agalactiae <SEQ ID 5233> which encodes the amino 
acid sequence <SEQ ID 5234>. This protein is predicted to be mannose-specific phosphotransferase system 
component IID (manZ). Analysis of this protein sequence reveals the following: 
Possible site: 39 

;>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -8.92 Transmembrane 281 - 297 ( 279 - 302) 
INTEGRAL Likelihood = -4.88 Transmembrane 187 - 203 { 185 - 205) 
INTEGRAL Likelihood = -4.35 Transmembrane 250 - 276 { 257 - 277) 
INTEGRAL Likelihood = -1.01 Transmembrane 129 - 145 ( 129 - 145) 

Final Results 

bacterial membrane Certainty=0.45S7 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD46487 GB:AF130465 mannose-specific phosphotransferase system 
component IID [Streptococcus salivarius] 
Identities = 247/303 (81%) , Positives = 276/303 (90%) 

Query: 1 MTEQIKLSKSDRQKVWWRSQFLQGSWNYERMQNNIGWAYALIPALKKLYTTKEDRAAALER 60 

M E+I+LS++DR4.KVWWRSQFLC^SWNYERMQ1S+GWAY4.LIPA+KKL'YT KED+AAAL+R 
Sbjct: 1 MAEKIQLSQADRKKVWWRSQFLQGSWNYERMQNLGWAYSLIPAIKKLYTNKEDQAAALKR 60 

Query: 61 HMEFFNTHPYVAAPIIGOTLALEEEKASGTPVEDKAIQ<3VKIG^GPIAGIGDPVFWFTV 120 

H+EFFNTHPYVAAPI +GVTLALEEEKA+GT +ED AIQGVKIGMMGPLAGIGDPVFWFTV 
Sbjct: 61 HLEFFNTHPYVAAPIMGVTLALEEEKANGTDIEDAAIQJSVKIGmGPLAGIGDPVFWFTV 120 

Query: 121 RPILGALGASLASAGNILGPIIFFVGWNLIRMSFLWYTQELGYKSGKEITKDMSGGILQD 180 

RPILGALGASLA AGNI GP+IFF+GWNLIRM+FLWYTQELGYK+G EITKDMSGGIL+D 
Sbjct: 121 RPILGALGASLAQAGNIAGPLIFFIGWNLIRMAFLWYTQELGYKAGSEITKDMSGGILKD 180 

Query: 181 ITKGASILGMFIIAVLVTCRWVAINFTVDLPKKTLSEGAYINFPKDHVSGQQLHDILGQVQ 240 

ITKGAS ILGMFILAVLV+RWV+ 1 FTV+LP K LS+GAYI +PK +VSG QL ILGQV 
Sbjct: 181 ITKGASILGMFIIAVLVERWSIVFTWqLPGKVLSKGAYIEWPKGNVSGDQLKTILGQVN 240 

Query: 241 SGLSLDKMQPQTLQGQLDSLIPGLAGLLLTFFCMWLLKKKVSPITIIIGLFIVGILARLA 300 

LS DK+Q TLQ QLDSLIPGL GLLLTF CMWLLKKKVSPITIIIGLF+VGI+A 
Sbjct: 241 DKLSFDKIQVDTLQKQLDSLIPGLMGLLLTFACMWLLKKKVSPITIIIGLFWGIVASFF 300 

Query: 301 GVM 303 

Sbjct: 301 GIM 303 

A related DNA sequence was identified in S.pyogenes <SEQ ID 523 5> which encodes the amino acid 
sequence <SEQ ID 5236>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.39 Transmembrane 284 - 300 ( 279 - 302) 
INTEGRAL Likelihood = -4.88 Transmembrane 261 - 277 ( 257 - 278) 
INTEGRAL Likelihood = -4.51 Transmembrane 181 - 197 ( 180 - 198) 

Final Results 

bacterial membrane Certainty=0. 4354 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



>GP:AAD4S487 GB-.AF1304S5 mannose-specific phosphotransferase system 
component IID [Streptococcus salivarius] 
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-1899- 

Identities = 239/303 (78%), Positives = 268/303 (87%) 

MTEQIKLTKSDRQRVWWRSQFLQGSWimiRMQNMGWAYALI PALKKLYTSPEDRAAALER 6 0 
M E+I+L+++DR+4-VWWRSQFLQGSWNYERMC3N+GWAY+IiIPA+KKI,YT+ ED+AAAL+R 
MAEKIQLSQADRKKVWWRSQFLQGSSfl^SRMQKLGVJAYSLIPAIKKLY'TNKEDQARALKR 60 

HMEFFNTHPYVAAPIIGVTLALEEERANGTPIDDKAIQGVKIGMMGPLAGIGDPVFWFTI 120 
H+EFFNTHPYVAAPI+GVTLALEEE+ANGT I+D AIQGVKIGMMGPLAGIGDPVFWFT+ 



RPILGALGASLA GNI GPL+FF GWNLIRKAFLWYTQE GYKAGSEITKDMSGGIL+D 









5 












10 


Sbjct: 
Query 


61 
121 




Sb 3 ct: 




15 




181 




Sbjct: 


181 


20 


Query: 


241 




Sbjct: 


241 






301 


25 


Sbjct: 


301 



ITKGAS1LGMFILAVLV+RWSI FT++E.PGK LS GAY+ +P G 



+S DK+Q TLQ QLDSL1PGL GLLLTF CMWLLKKKVSPI IIIGLF GI+A 



: 301 GIM 303 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 2S5/303 (84%), Positives = 277/303 (91%) 

MTEQIKLSKSDRQKVWWRSQFLQGSWmTFJ^QMGWAYALIPALKKLYTTJCEDRAAALER 6 0 
MTEQIKL+KSDRQ+VWWRSQFLQGSWNYERMQNMGWAYALIPALKKLYT+ EDRAAALER 
MTEQIKLTKSDRQRVWmSQFLQGSWNYERMQNMGWAYALIPALKKLYTSPEDRAAALER 6 0 



ITKGASILGMFILAVLV+RWV+INFT+DLP K LS+GAY+ FP V G +L IL 



G+SLDK+Q QTLQGQLDSLIPGLAGLLLTF CMWLLKKKVSPI IIIGLF GILA LA 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1687 

A DNA sequence (GBSxl791) was identified in S.agalactiae <SEQ ID 5237> which encodes the amino 
acid sequence <SEQ ID 523 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
- Final Results 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 




Query: 


181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


301 
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bacterial cytoplasm Certainty=0.2580'(Af firmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1688 

A DNA sequence (GBSxl792) was identified in S.agalactiae <SEQ ID 5239> which encodes the amino 
acid sequence <SEQ ID 5240>. This protein is predicted to be mannose-specific phosphotransferase system 
component IIC (manY). Analysis of this protein sequence reveals the following: 
Possible site: 39 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 142 - 158 ( 137 - 165) 
INTEGRAL Likelihood = -2.60 Transmembrane 65 - 81 ( 61 - 81) 
INTEGRAL Likelihood = -1.97 Transmembrane 103 - 119 ( 103 - 122) 

Final Results 

bacterial membrane --- Certainty=0. 3378 (Affirmative) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ LD 9301> which encodes amino acid sequence <SEQ ID 9302> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD46486 GB:AF130465 mannose-specific phosphotransferase system 
component IIC [Streptococcus salivarius] 
Identities = 134/186 (72%), Positives = 154/186 (82%), Gaps = 1/1BS (0%) 

Query: 1 MVKSGDFTQKGINFAFSTAVPLAIAGLFLTMIVRTISTALVHAGDKAASEGNFAAIERFH 60 

+VK G+FT +GI A +TA+PLA+AGLFLTM+VRT S ALVHA DKAA GN A +ER H 
Sbjct: 86 LVKGGNFTTEGIGVATATAIPLAVAGLFLTMLVRTASVALVHAADKAAESGNIAGVERAH 145 

Query: 61 FIALLLQGLRIAFPAALLLAIPSSSVQSILEAMPDWLNGGMQVGGAMWAVGYAMVINMM 120 

++ALLLQGLRIA PAALLLAIP+ SVQ L MP WLN GM VGG MWAVGYAMVINMM 
Sbjct: 146 yiiALLLQGLRIAVPAALLLAIPAESVQHALGLKPSWL^GIWVGGGMWAVGYAMVINMM 205 

Query: 121 ATREWPFFALGFAIAALNQLTLIAI'IGTIGVAIALIYISLSKMGGSK-GTSNAGSNDPIG 179 

ATREVWPFFA+GFA AA+ +QLTLIA+G IGVAIA IY++LSK GG G +++GS DPIG 
Sbjct: 206 ATREWPFFAIGFAFAAISQLTLIALGAIGVAIAFIYLNLSKQGGGNGGGTSSGSGDPIG 265 

Query: 180 DILEDY 185 

DILEDY 
Sbjct: 266 DILEDY 271 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5241> which encodes the amino acid 
sequence <SEQ ID 5242>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood =-11.30 Transmembrane 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



Transmembrane 226 - 242 

Transmembrane 102 - 118 

Transmembrane 71 - 87 

Transmembrane 150 - 166 
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Transraenibrane 166 - 202 ( 18S - 202) 
Transmembrane 37 -, 53 ( 37 - 53) 

Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAD46486 GB:AF130465 mannose-specif ic phosphotransferase system 
component IIC [Streptococcus salivarius] 
Identities = 211/271 (77%), Positives = 237/271 (85%), Gaps = 2/271 (0%) 

MSDISIISAILWI IAFFAGLEGILDQFQMHQPLVACTIjIGLVTGHLFAGVILGGTLQML 60 
MSD+SI ISA1LW++AF AGLEGILDQFQ HQPLVACTLIG TG+L AG++LGG+LQM+ 
MSDMSI ISAILWWAFLAGLEGILDQFQFHQPLVACTLIGAATGNLTAGIMLGGSLQMI 6 0 

Query: 61 ALGWANIGAAVAPDAALASVAAAIIMVKSGDFTQKGITFAYSTAIPLAVAGLFLTMIWT 120 



Query: 


1 


Sbjct: 


1 




61 


Sbjot: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 





AL WANIGAAVAPDAAIASVAAAm 



?AI PIAVAGLFLTM+VRT 



ALAWANIGAAVAPDAALASVAAAIILVKGGNFTTEGIGVATATAIPLAVAGLFLTMLWT 12 0 

LST7ALvHAGDKAAAEGNFAGIERFHFIALIjLQGLR1AVPAALLVAVPTSAVQSVLNAMPN 180 

S ALVHA DKAA GN AG+ER H++ALLLQGLRIAVPAALL+A+P +VQ L MP+ 
ASVALVHAADK^SGMIAGVERAHyiALLLCGLRIAVPAALLIAIPAESVQHALGLMPS 180 

WU^GMQIGGAIWVAVGYAWINMMATREVWPFFALGFALAAISQLTLIAT'IGVIGVAIAF 240 
WLN GM +GG MWAVGYAMVINMMATREVWPFFA+GFA AAISQLTLIA+G IGVAIAF 
WLNHGMVVGGGMWAVGYAMVINMMATREVWPFFAIGFAFAAISQLTLIALGAIGVAIAF 240 

IYLNLSKKGG- -NGGNAAGSADPIGDILEDY 269 
IYLNLSK+GG GG ++GS DPIGDILEDY 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/185 (83%), Positives = 173/185 (92%), Gaps = 1/185 (0%) 



FIALLLQGLRIA PAALL+A+P+S+VQS+L AMP-WLN GMQ+GGAMWAVGYAMVINMM 



ATREVWPFFALGFALAA++QLTLIAMG IGVAIA IY++LSK GG+ G + AGS DPIGD 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1689 

A DNA sequence (GBSxl793) was identified in S.agalactiae <SEQ ID 5243> which encodes the amino 
acid sequence <SEQ ID 5244>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 
Final Results 



Query: 




Sbjct: 


86 


Query: 


61 


Sb j ct : 


146 




121 


Sbjct: 


206 






Sbjct: 


265 
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bacterial cytoplasm Certainty=0 .3171 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1690 

10 A DNA sequence (GBSxl794) was identified in S.agalactiae <SEQ ID 5245> which encodes the amino 
acid sequence <SEQ ID 5246>. This protein is predicted to be pseudouridine synthase (rluC). Analysis of 
this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

15 

Final Results 

bacterial cytoplasm 
bacterial membrane 
bacterial outside 

20 

The protein has homology with the following sequences in the GENPEPT database. 

>QP:BAB06566 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 124/281 (44%), Positives = 171/281 (60%), Gaps = 8/281 (2%) 





16 


LLKSHDVSRGLLAKIKYRGGKI FVNGEEQNAI FLLE IGD WTID I PDE - PSHETL-EPVP 


73 






L + VS+ LA IK++GG I +NGEE + + D VT+++P E PS + EPVP 




Sbjct: 


24 


LREGKHVSKRSIiAAIKFKGGTILIjNGEEVTVRETvHVNDQVTLELPHEYPSPSMIAEPVP 


83 




74' 


HDLDI1YEDDHFLILNKPFGFASIPSSIH-SNTIANFIICHYYVSNNYANQQVHIVTRLDR 


132 






D+IYE+DH+L++NKP G +IPS H T+AN + +Y+ A H V RLD+ 




Sbjct: 


84 


--FDVIYENDHYLVVNKPAGVPTIPSRDHPQC-TIANGLU^FQRQKMA-ATFHAVNRLDK 


140 




133 


DTSGLMLFAKHGYAHARLDKQLQAKAIEKRYYALVSGSGDLADSGDIIAPIARDVDSIIT 


192 






DTSGL++ AKH AH +L KQ + I++ Y A+V G + + G I APIAR +S+1T 




Sbjct: 


141 


DTSGLLIVAKHQLAHDQLSKQQRQGNIKRTYMAIVQGEIEQQE-GTITAPIARKEESLIT 


199 


Query: 


193 


RRVHESGKYAHTSYQWARYGDVRLVDIKLHTGRTHQIRVHFAHIGFPLLGDDLYGGRMD 


252 






R V E G+ A T ++V+ R +V ++L TGRTHQIRVHF+ + +G+PL GDDLYGG 




Sbjct: 


200 


REVREDGQLAITHFKVIDRLNQGTIVQVQLETGRTHQIRVHFSYLGYPLFGDDLYGGERK 


259 


Query: 


253 


LGINRQALHCHSLSFYDPFMGKINKQTLDLTDDFDSVIMEL 293 








GI RQALH L+ + PF T L D +1 L 




Sbjct: 


260 


-GIERQALHSTELTIHCPFTEVEQTFTEGLPPDMKELIRHL 299 





45 A related DNA sequence was identified in S.pyogenes <SEQ ID 5247> which encodes the amino acid 
sequence <SEQ ID 5248>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

, 50 Final Results 

bacterial cytoplasm Certainty=0. 2786 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 



--- Certainty=0. 2717 (Affirmative) < suco 

Certainty=0. 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 



55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/294 (75%), Positives = 251/294 (84%), Gaps = 1/294 (0%) 
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Query: 1 MKEEYVAKERCKVKTDLKSHDVSRGLLAKIKYRGGKIFVNGEEQNAIFLLEISDWTIDI 60 

M+FE+VA +R KWKTLLKS+DVS+GLIAKIKY+GG I VNG EQNAI+LL++GDVVTIDI 
Sbjct: 1 MREEFVADKRIKVKTLLKSYDVSKGLIAKIKYKGGNILVNGIEQNAIYLLQVGDWTIDI 60 



Query: 121 NQQVHIVTRLDRDTSGLMLFAKHGYAHARLDKQLQAKAIEKRYYALVSGSGDLADSGDII 180 

+QQVHIVTRLDRDTSGLMLFAKHGYAHARLDKQLQ ++IEKRY+ALVSG1-G L D GDII 
Sbjct: 121 DQQVHIVTRLDRDTSGLMLFAKHGYAHARLDKQLQTRSIEKRYFALVSGNGMLPDEGDII 180 

Query: 181 AP1ARDVDSIITRRVHESGKYAHTSYQWARYGD-VRLVDIKLHTGRTHQIRVHFAHIGF 239 

API R DSIITR V GKYA TSY+WARY + V LVDIKLF.TGRTHQIRVHFAHIGF 
Sbjct: 181 APIGRSKDSIITRAVDPMGKYAKTSYKVVARYSBNVHIiVDIKLHTGRTHQIRVHFAHIGF 240 

Query: 240 PLLGDDLYGGRMDLGINRQALHCHSLSFYDPFMGKINKQTLDLTDDFDSVIMEL 293 

PLLGDDLYGGR+DLGI RQALHCH L+F DPF + LTDDFDSVI+ L 

Sbjct: 241 PLLGDDLYGGRLDLGITRQALHCHYLNFKDPFTESDCSYAIHLTDDFDSVIIGL 294 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1691 

A DNA sequence (GBSxl795) was identified in S.agalactiae <SEQ ID 5249> which encodes the amino 
acid sequence <SEQ ID 5250>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1521 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9845> which encodes amino acid sequence <SEQ ID 9846> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13 018 GB:Z99110 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 120/267 (44%) , Positives = 174/267 (64%) , Gaps = 3/267 (1%) 

RVAIIANGKY'QSKRVASKLFAAFKHDPDFYLSKKDPDIVISIGGDGMLLSAFHMYEKQLD 72 
+ A+ + G S + SK+ A+ D D L + +P+IVIS+GGDG LL AFH Y +LD 
KFAVSSKGDQVSDTLKSKI-QAYLLDFDMELDENEPEIVISVGGDGTLLYAFHRYSDRLD SO 

KVRWGVHTGHLGFYTDYRDFETOTLINNXPOT3KGEQISYPILKVTITD-EDGRVIRARA 131 
K FVGVHTGHLGFY D+ E++ h+ + + YP+L+V +T E+ R R A 



L E++S+NNRV+RT+GS +++P I P+ 4- ++ID+ T+ +K+V I 



Query: 




Sbjct: 


2 




73 


Sbjct: 


61 




132 


Sbjct: 




Query: 


192 


Sbjct: 


131 


Query: 


252 


Sbjct: 


241 



FW+RV D+FIG+ E 
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A related sequence was also identified in GAS <SEQ ID 9137> which encodes the amino acid sequence 
<SEQ ID 9138>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2190 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 155-157 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/276 (84%) , Positives = 257/276 (93%) 
Query: 
Sbjct: 

Query: 61 LSAFHMYEKQLDKVRFVGTOTGHLGFYTDYRDFEVDTLINNLKNDKGEQISYPILKVTIT 120 

LSAFHMYEK+LDKVRPVG+HTGHLGFYTDYRDFEVD LI+NL+ DKGEQI SYPILKV IT 
Sbjct: 61 LSAFHMYEKELDKVRFVG1HTGHLGFYTDYRDFEVDKLIDNLRKDKGEQISYPILKVAIT 120 

Query: 121 LEDGRVIRARALNESTIKRIEKTMVADWINQWFERFRGDGILVSTPTGSTAYNKSLGG 180 

L+DGRV++ARALNE+T+KRIEKTMVADV+IN V FE FRGDGI VSTPTGSTAYNKSLGG 
Sbjct: 121 LDDGRVVKARADNFAWKRIEKTIWADVIINKVKFESFRGDG1SYSTPTGST 



Query: 181 AVLHPTIEALQLTEISSLNNRWRTLGSSVIIPKKDAIEIVPKRVGVYTISIDNKTVHYK 240 

AVLHPTIEALQLTEISSLNNRV+RTLGSS+IIPKKD IE+VPKR+G+YTI SIDNKT K 
Sbjct: 181 AVLHPTIEALQLTEISSLNNRVFRTLGSSIIIPK3CDKIELVPKRLGIYTISIDNKTYQLK 240 

Query: 241 NVTKIEYSIDEKSINFVSTPSHTSFWERVNDAFIGE 276 

NVTK+EY ID++ I+FVS+PSHTSFWERV DAFIGE 
Sbjct: 241 NVTKVEYFIDDEKIHFVSSPSHTSFWERVKDAFIGE 276 



A related GBS gene <SEQ ID 8879> and protein <SEQ ID 8880> were also identified. Analysis of this 
protein sequence reveals an RGD motif at residues 159-161. 

The protein has homology with the following sequences in the databases: 

40 45.0/65.6% over 264aa 

Bacillus subtilis 

EGAD | 107338 | hypothetical protein Insert characterized OMNI |NT01BS1363 BC541A protein- 
related Insert characterized 

SP| 031612 |YJBN_BACSU HYPOTHETICAL 30.0 KDA PROTEIN IN MECA-TENA INTERGENIC REGION. Insert 
45 characterized 

GP|2633515|emb|CAB13018.l| |Z99110 similar to hypothetical proteins Insert characterised 
PIR|F69844]F69844 conserved hypothetical protein yjbN - Insert characterized 

ORF02026(337 - 1134 of 1437) 

50 EGAD [10733 8 |BS1162 (2 - 266 of 266) hypothetical protein {Bacillus subtilis} OMNI |NT01BS1363 

BC541A protein- related SP|031612|YJBN_BACSU HYPOTHETICAL 3 0.0 KDA PROTEIN IN MECA-TENA 
INTERGENIC REGION. GP| 2633515 | emb| CAB13018 . l] | Z99110 similar to hypothetical proteins 
{Bacillus subtilis} PIR|F69844|F69844 conserved hypothetical protein yjbN - Bacillus 
subtilis 

55 %Match =22.8 

%Identity =44.9 %Similarity =65.5 

Matches = 120 Mismatches = 89 Conservative Sub.s = 55 



RKF*QKXKSELWL*IFGQPSNIH*ITSIRGTSLK:<IjNKDWRKQQKSL*OTM 



WO 02/34771 



PCT/GB01/04789 



IVMTQMNFTDRATRVAIIANGKYQSKRVAS^^ 

: |: : I I : 11= I « I I I = II III I =111 Nil 

MKFAVSSKGDQVSDTLKSKIQA-YLLDFDMELDENEP2IVISVGGDGCLLYAFHRYSDRLDKTAFVGV 



HTGHLGFYTDYRDFEVDTLimLKKTOKGEQISYPILKVTITL-EDGRVII^^RAnNESTIKRIEKTMVADVVINQVVFERF 

MINIM M M: M : = MMM M Ml I II II III II -MM I Ml I 

HTGHLGFYADWPHEIEKLVIAIAKTPYHTVEYPLLEVIVTYHENEREERYLALNECTIKSIEGSLVADVEIKGQLFETF 



804 834 864 894 924 954 984 1014 

RGDGILVSTPTGSTAYNKSLGGAVLHPTIFJU^QLTEISSIATffiVYRTLGSSVIIPKKDAIEIVPKRVGVYTISIDNKTVH 

MIM M 1 1 = 1 1 1 1 1 1 1 = 1 1 1 1 = - ! I = I IMI MMMIIIMMII I I: : : MM M 



160 170 180 190 200 210 220 

1044 1074 1104 1134 1164 1194 1224 1254 

YKNVTKIEYSIDEKSTNFVSTPSHTSFWERViroAFI^^ 
MM I : : « I i MMI IMIM I 

HKDVKSIRCQVASEKVRF-ARFRPFPFWKRVQDSFIGKGE 
240 250 260 

A related DNA sequence was identified in S.pyogenes <SEQ ID 525 1> which encodes the amino acid 
sequence <SEQ ID 5252>. Analysis of this protein sequence reveals the following: 

~3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm --- Certainty=0. 2190 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS sequences follows: 

18 

» 257/276 (92%) 

Query: 1 VMTQ^4NYTGKVKRVAIIANGKYQSKRVASKIjFSVFKDDPDFYLSKKNPDIVISIGGDGMIJ 60 

VMTQMN+T + RVAIIANGKYQSKRVASKLF+ FK DPDFYLSKK+PDIVISIGGDGMb 
Sbjct: 1 VMTQMNFTDRATRVAIIANGKYQSPCRVASKLFAAFKHDPDFYLSKKDPDIVISIGGDGML 60 

Query: 61 LSAFHMYEKELDKVRFVGIHTGHLGFYTDYRDFEVDKLIDNLRKDKGEQISYPILKVAIT 120 

LSAFHMYEK+LDKVRFVG+HTGHLGFYTDYRDFEVD LI+NL+ DKGEQISYPILKV IT 
Sbjct: 61 LSAFH^fYEKQLDI<VRFVGVHTGHL3FYTDYRDFEVDTLINNLIaroKGEQISYPILICVTIT 120 

Query: 121 LDDGRWKARAIiNEATVKRIEKTiWADVIINHVKFESFRGDGISVSTPTGSTAYNKSLGG 180 

L+DGRV++ARALNE+T+KRIEKTMVADV+IN V FE FRGDGI VSTPTGSTAYNKSLGG 
Sbjct: 121 LEDGRVIRARALNESTIKRIEKTMVADWINQWFERFRGDGILVSTPTGSTAYNKSLGG 180 

Query: 181 AVLHPTIEALQLTEISStNNRVFRTLGSSIIlPKKDKIELVPKRLGIYTISIDNKTYQLK 240 

AVLHPTIEALQLTEISSLNNRV+RTLGSS+IIPKKD IE+VPKR+G+YTTSIDNKT K 
Sbjct: 181 AVLHPTIEALQLTEISSLNNRVYRTLGSSVIIPKKDAIEIVPKRVGVYTISIDNKTVHYK 240 

Query: 241 NVTKVEYFIDDEKIHFVSSPSHTSFWEKVKDAFIGE 276 

NVTK+EY ID++ I+FVS+PSHTSFWERV DAFIGE 
Sbjct: 241 NVTKIEYSIDEKSINFVSTPSHTSFWERVNDAFIGE 276 

SEQ ID 8880 (GBS308) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 4; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 3; MW 59kDa). 

GBS308-GST was purified as shown in Figure 226, lane 8. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1692 

A DNA sequence (GBSxl796) was identified in S.agalactiae <SEQ ID 5253> which encodes the amino 
acid sequence <SEQ ID 5254>. This protein is predicted to be permease. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3653 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06568 GB:AP001516 GTP pyrophosphokinase [Bacillus halodurans] 
Identities = 115/208 (55%), Positives = 159/208 (76%), Gaps = 3/208 (1%) 

Query: 4 DWETFLDPYIQTVGELKIKLRGIRKQFRKQNRHSP1EFVTGRVKSVESIQEKITVLRGISE 63 
20 +W+ FL PY Q V ELK+KL+GIR^Q++K ++H+PIEFVTGRVK + SI +K + + I 

Sbjct: 3 NWDVFLTPYKQAVEEEiKVKLKGIREQYQKSSKHTPIEFVTGRVKPISSILDKAIRKNIPL 62 

Query: 64 ENI^QDLQDIAGLRIMVQFVDDVDEVIiALLRI<RHDMTWQERDYITHMKSSGYRSYHVW 123 
+ L + +QD+AGLRI+ QFV+D++ V+ L+R R D +V+ERDY+ K SGYRSYH+V+ 
25 Sbjct: 63 DQLEEKMQDLAGLRIVTQFVEDIETWQLIRSRSDFEIVEERDYVEQKKDSGYRSYHLVL 122 

Query: 124 EYPVDTIDGQKKVIAEIQIRTIAMNFWATIEHSIaNYKYQGDFPEEIKQRLEKTAKIALEL 183 

YPV TI+G+K++L E+QIRTLAMNFWATIEHSLNYKY G+ P IK RL++ A+ A L 
Sbjct: 123 RYPVQTIEGEKRILVELQIRTLAMNFWATIEHSLNYKYSGE I PLNI KTRLQRAAEAAFRL 182 

30 

Query: 184 DEEMRKIREDIREAQLLFDPLNRKLSDG 211 

DEEM +IR+++REAQ + + RK G 
Sbjct: 183 DEEMSQIRDEVREAQQI ITRKQEQG 207 



35 A related DNA sequence was identified in S.pyogenes <SEQ ID 5255> which encodes the amino acid 
sequence <SEQ ID 5256>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 4064 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 196/223 (87%), Positives = 213/223 (94%) 

Query: 1 MSMDWETFLDPYIQTVGELKIKLRGIRKQFRKQNRHSPIEFl'TGRVKSVESIQEKMVIjRG 60 
M++DWE FLDPYIQTVGELKIKLRGIRKQ+RKQNR+SPIEFVTGRVKS+ESI+EKM+LRG 
50 Sbjct: 1 MTLDWEEFLDPYIQTVGELKIKLRGIRKQYRKQNRYSPIEFVTGRVKSIESIKEKMILRG 60 

Query: 61 ISEEMAQDLQDIAGLRIIWQFVDDVI)EV1ALLRKRHDMTWQERDYITHMKSSGYRSYH 120 

+ EEN+AQD+QDIAGLRIMVQFVDDV+EVLALLR+R DMT+V ERDYI +MKSSGYRSYH 
Sbjct: 61 VIEENIAQDIQDIAGLRIMVQFVDDVEEVLAIiLRQRQDMTIVYERDYIRNMKSSGYRSYH 120 

55 

Query: 121 VWEYPVDTIDGQKKVIAEIQIRTU^FWATIEHSLNYKYQGDFPEEIKQRLEKTAKIA 180 

WVEYPVDTI+GQKKVIAEIQIRTLAMNFWATIEHSIjNYKY GDFPEEIK+RLE TAKIA 
Sbjct: 121 VWEYPVDTIEGQKKVIAEIQIRTLA^FWATIEEJSIJSrYKYGGDFPEEIKKRLE 180 
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Query: 181 LELDEEMRKIREDIREAQLLFDPLNRKLSDGVGNSDDTDEFYR 223 

L.ELDEEMRKIREDIREAQLLFDP+ R LSDGVGNSDDTDE YR 
Sbjct: 181 LELDEEMRKIREDIREAQLLFDPVTRNLSDGVGNSDDTDELYR 223 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1693 

A DNA sequence (GBSxl797) was identified in S.agalactiae <SEQ ID 5257> which encodes the amino 
10 acid sequence <SEQ ID 5258>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N- terminal signal sequence 

Final Results 

15 bacterial cytoplasm --- Certainty=0 . 2266 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Mot Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CAB13015 GB:Z99110 yjbK [Bacillus subtilis] 

Identities = 63/184 (34%) , Positives = 99/184 (53%) , Gaps = 10/184 (5%) 



Query: 62 LTLKIPREVGNLEHNHDLT - --LEEAKYIVKNGQFPEDTEIASLILEKGVDPTKLAVFGQL 119 

LTLK P +VG LE + L+ + A + V G P ++ L +D + FG L 

Sbjct: 65 LTLKEPADVGLLETHQQLSEVSDLAGFSVPEG- - PVKDQLHKL QIDTDAIQYFGSL 118 

Query: 120 TTTRREMETSIGLMALDSNIYADIKDYEXELEVKQPKQGKRDFDQFLKENNINFKYAKSK 179 

T R E ET GL+ LD + Y + +DYE+E E +G++ F++ L++ +1 + K+K 

Sbjct: 119 ATNRAEKETEKGLIVLDHSRYLNKEDYEIEFEAADWHEGRQAFEKLLQQFSIPQRETKNK 178 

Query: 180 VARF 183 
+ RF 

Sbjct: 179 ILRF 182 

A related DNA sequence was identified in S.pyogenes <SEQ ID 525 9> which encodes the amino acid 
sequence <SEQ ID 5260>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 3470 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 114/188 (60%), Positives = 139/188 (73%), Gaps = 1/188 (0%) 



Query: 1 MTHLEIEYKTLIJSKDEFNRLTSLFSHVQPITQT^FOTETFEMKAHRMSLRIRTLPNRA 60 

MT+LEIEYKTLL K+E+NRL S HV P+TQTNYY DT+ F++KA++MSLRIRT N A 
Sbjct: 1 MmLEIEYKTLLTKNEYNRLLSQMKHOTPVTQTNYYIDTKAFDLKANKM^ 60 

Query: 61 ELTLKIPREVGNLEHNHDLTLEEAKYrVKNGQFPEDTEIASLILEKGVDPTKLAVFGQLT 120 

ELTLK+P +VGN E+N L LE+AK ++K+G PE T + +1+ KG+ P+ L FG LT 
Sbjct: 51 ELTLKVPEKVGNREYWPLFLEQAKDMIKHGNLPESTAL-DIIISKGIKPSALVTFGNLT 119 
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Query: 121 TTRREMETSIGLMALDSNIYADIKDYELEIiEVKQPKQGKRDFDQFLKENWINFKYAKSKV 180 

T RRE IG +ALD N+YA+ KDYELELEV QGK DFD FL E +1 FKYAKSKV 
Sbjct: 120 TVFi?ETVIPIGKIJMjDYNLYANTKDYELELEVSDALQGKIDFDSFLSEYHITFKYAKSKV' 179 

5 Query: 181 ARFSATLK 188 

AR TLK 
Sbjct: 180 ARCINTLK 187 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1694 

A DNA sequence (GBSxl798) was identified in S.agalactiae <SEQ ID 5261> which encodes the amino 
acid sequence <SEQ ID 5262>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1815 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

20 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1695 

A DNA sequence (GBSxl799) was identified in S.agalactiae <SEQ ID 5263> which encodes the amino 
acid sequence <SEQ ID 5264>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0621 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

35 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 1696 

A DNA sequence (GBSxl800) was identified in S.agalactiae <SEQ ID 5265> which encodes the amino 
acid sequence <SEQ ID 5266>. This protein is predicted to be ribose-phosphate pyrophosphokinase (prsA). 
Analysis of this protein sequence reveals the following: 

45 Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 
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bacterial cytoplasm Certainty=0 . 3369 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0D00 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11827 6B:Z99104 phosphoribosyl pyrophosphate synthetase 
[Bacillus subtilis] 
Identities = 166/319 (52%), Positives = 231/319 (72%), Gaps = 4/319 (1%) 



Query: 




Sbjct: 


1 




61 


Sb j ct : 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


179 


Query: 


240 


Sbjct: 


239 


Query: 


299 


Sbjct: 


299 



M+ QY DK +K+FSL +N E+A++I+ G+ LGK S +FSDGE+ INIEE++RG D Y 



+h AG RV+ LDLHA Q+QGFFDI P+D+L VP+ 



- RAH LA+ L +PIAIID + 4- E I+G +EGK AI+IDDI++T T AA 



GA E+YA +H + 4G A + + ++ I+E++VT4-S+ L +E+ 



+A+AIIR4HE++ +S LFS 
LAEAI IRVHEQQSVS YLFS 317 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5267> which encodes the amino acid 
sequence <SEQ ID 5268>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1830 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 278/324 (85%) , Positives = 305/324 (93%) 

MAEQYADKQIKLFSLTANREIAEKISQASGI PLGKMSSRQFSDGEIMINIEETVRGDDIY 6 C 
M E+YADKQIKLFSLT+N IAEKI++A+GIPLGKMSSRQFS+GEIMINIEETVRGDDIY 
MTERYADKQI KLFSLTSNLP IAEKI AKAAG I PLGKMS SRQFSNGE IMINIEETVRGDDI Y 6 C 

IIQSTSFPVNDNLWELLIMIDACKRASANTVNIWPYFGYSRQDRIAASREPITAKLVAN 11 
IIQSTSFPVNDNLWELLIMIDACKRASANTVNIV+PYFGYSRQDR+A REPITAKLVAN 



ML KAG+DRV4-TLDLHAVQVQGFFDI PVDNLFTVPLFAE Y++LGLSG DWWSPKNSG 



IKRARSLAEYLDSPIAIIDYAQDDSERE+GY-IG+V GKKAI + IDDILNTGKTFAEAAKI 



LER GAT+ YAVASHGLFAGGAAD+LE+API +EI IVTDSV +K R+P H4- YL4-AS LIA 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 
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Sbjct: 241 LERSGATDTYAVASHGLFAGGAADVLETAPIKEIIVTDSVKTKNRVPENVTYLSASDLIA 300 

Query: 301 DAIIRIHERKPLSPLFSYRSDKKD 324 

+AIIRIHER+PLSPLFSY+ K+ 
Sbjct: 301 EAIIRIHERRPLSPLFSYQPKGKN 324 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1697 

A DNA sequence (GBSxl801) was identified in S.agalactiae <SEQ ID 5269> which encodes the amino 
acid sequence <SEQ ID 5270>. This protein is predicted to be Fe-S cluster formation protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 16 

■»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1981 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

formation protein [Bacillus halcdurans] 
237/373 (62%) , Gaps = 6/373 (1%) 

Query: 3 IYLDNAATTALTPSVIEKMTNVMTSNYGNPSSIHTFGRQANQLLRECRQI IAEYLNVNSR 62 

IYLD+AAT+ + P VI + M +GNPSSIH FGR+A Q + E R IA L + 

Sbjct: 4 IYLDHAATSPVHPEVIQAMLPYYEEQFGNPSSIHQFGRRARQGVDEARGTIARLLQADPS 63 

Query: 63 EIIFTSGGTESNWTAIKGYALANQLKGKHIITSEIEHHSVLHTMTYLSERFGFDITYLKP 122 

E IFTSGGTE++N AI GYA ++ KG HI1TS++EHH+VUI L E GF++TY+ 
Sbjct: 64 E FI FTSGGTEADNIAI FGYAYQHRGKGNHI ITSQVEHHAVLHACQEL - EHQGFEVTYVPV 122 

Query: 123 NH-GQITAKDVQEALRDDTIMVSLMFVNNETGDFLPIQEIGQLLRNHQAVFHVDAVQVFS 181 

+ +DV++ALRDDTI+V+LM+ NNE G PI EIG LL++HQAV H DAVQ F 

Sbjct: 123 DQTGRVSVEDVRQALRDDTILVTLMYGNNEVGTIQPIAEIGALLQDHQAVLHTDAVQAFG 182 

Query: 182 KMELDPHSLGIDFLAASAHKFHGPKGVGILYCAPH-HFDSLLHGGDQEEJCRRASTENIIG 240 

+ ++ L +D L+ SAHK +GPKGVG+LY L+GG+QE K+RA TEN+ 

Sbjct: 183 AISIELDHLPVDMLSVSAHKINGPKGVGLLYVRDGIVLKPALYGGEQERKKRAGTENVAA 242 

Query: 241 IAGMSQALTDATTNTLKNWTHI SQLRTTFLDAISD - - LDFYLNNGQDC -LPHVLNIGFPG 297 

I G ++A+ A N + TFD +F+NQ LPH+ N+ FPG 

Sbjct: 243 IIGFAKAVEIAIANREERQKAYFDYCQTFFDQFQQEGVQFVMNGHQTWRIjPHIFNVSFPG 302 

Query: 298 QNNGLLLTQLDI^GFAVSTGSACTAGT\7EPSH\TLiTSLYt3ANSPRLNESIRISFSEljNTQE 357 

+ LL LDLAG A S+GSACTAG++EPSHVL +++G++S + +R SF NT+E 
Sbjct: 303 VHVEALLVNLDLAGIAASSGSACTAGSIEPSHVLVAMHGSDSELVTSGVRFSFGLGNTKE 362 

Query: 358 EILELAKTLRKI I 370 

+ AK KI+ 
Sbjct: 363 HVQWAAKETAKIV 375 

A related DNA sequence was identified in S.pyogenes <SEQ ID 527 1> which encodes the amino acid 
sequence <SEQ ID 5272>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 1477 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 235/370 (63%) , Positives = 285/370 (76%) 

Query: 2 MIYLDNAATTALTPSVIEKMTNVMTSNYGNPSSIHTFGRQANQLLRECRQIIAEYLNVNS 61 

M Y DN&&TT L+E+VI MT M N+GNPSSIH +GR+AN++LRECRQ IA L + 
Sbjct: 1 MTYFDNAATTPLSPNVIRAMTAAMQDNFGNPSSIHFYGRRANKILRECRQAIARNLGASE 60 

TSGGTESNN AIKGYAIA+Q KGKH+IT+ IEHHSVLHTM YL ERFGF++TYL 
Sbjct: 61 QQIIVTSGGTESNNMAIKGYALAHQAKGKHLITTTIEHHSV1HTMAYLEERFGFEVTYLP 120 

Query: 122 PNHGQITAKDVQEALRDDTIMVSLMF\™ETGDFLPIQEIGQLLRNHQAVFHVDAVQVFS 181 

+GQI D++4ALRDDTI+VS+M+ NNETGD LPI+4IG LL++HQA FHVDAVQ 
Sbjct: 121 CQNGQINLSDLKQALRDDTILVSIMYANNETGDLLPIKDIGNLLKDHQAAFHVDAVQAVG 180 

Query: 182 KMELDPHSLGIDFLAASAHKFHGPKGVGILYCAPHHFDSLLHGGDQEEKRRASTENIIGI 241 

K+++ P LGIDFL+ASAHKFHGPKG G LY D LLHGGDQE KRRASTEN++GI 

Sbjct: 181 KLKIIPSELGIDFLSASAHKFHGPKGCGFLYSNGQPIDPLLHGGDQEGKRRASTENMLGI 240 

Query: 242 AGMSQALTDATIOTLHSTOTHISQLRTTFLDAISDLDFYLNNGQDCLPHVLNIGFPGQNNG 3 01 
GM+QALTDA T ++ HI LR + + L +Y+N G LPHVLNIGF G N 

Query: 302 LLLTQLDIAGFAVSTGSACTAGTVEPSHVLTSLYGANSPRLilESIRISFSELNTQEEILE 361 

+LLTQLDIAG AVSTGSACTAG V PSHVL + YG +S RB ESIRISFS+ N+ E++ + 
Sbjct: 301 ILLTQLDIiAGIAVSTGSACTAGAVNPSHVLAAYYGDDSSRLKESIRISFSDQNSIEDVNQ 360 

Query: 362 LAKTLRKIIG 371 

IA+TL+ I+G 
Sbjct: 361 IAQTLKNILG 370 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1698 

A DNA sequence (GBSxl802) was identified in S.agalactiae <SEQ ID 5273> which encodes the amino 
acid sequence <SEQ ID 5274>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2753 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12416 GB:Z99107 ydiH [Bacillus subtilis] 
Identities = 96/202 (47%) , Positives = 140/202 (68%) , Gaps = 4/202 (1%) 

Sbjct: 

Query. 67 DVKKLMNFFAE I LNDHSTTNvWLVGCGNIGRALLHYRFHDRNKMQISMAFDLDSNDLVGK 126 

+V L++FF + L+ T+V+L+G GN+G A LHY F N +ISMAFD++ + + 
Sbjct: 68 NVDYLLSFFRKTLDQDEMTDVILIGVGNLGTAFLtrYlIFTKNNNTKISMAFDINESKI - -G 125 



60 Query: 127 TTEDGIPVYGISTINDHLIDSDIETAILTVPSTEAQEVADILVKAGIKGILSFSPVHLTL 186 

T G+PVY + + H+ D + AILTVP+ AQ + D LV GIKGIL+F+P Ii + 
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Sbjct; 126 TEVGGVPVYNLDDLEQHVKDESV--lU:LTVPAVi^QSITDRLVALGIKGILt^EJ^IiJV 183 

Query: 187 PKDI I VQYVDLTSELQTLLYFM 208 

P+ 1 + ++DL ELQ+L+YF+ 
Sbjct: 184 PEHIRIHHIDLAVELQSLVYFL 205 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5275> which encodes the amino acid 
sequence <SEQ ID 5276>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2313 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 167/210 (79%) , Positives = 189/210 (89%) 

Query: 1 MIMDKSIPICATAKRLSLYYRIFKRFNTDGIEICASSKQIADALGIDSATVRRDFSYFGELG 60 

+++DKSIPKATAKRLSLYYRIFKRF+ D +EKASSKQIADA+GIDSATVRRDFSYFGELG 
Sbjct: 1 WIDKSIPKATAKRLSLYYRIFKRFHADQVEKASSKQI7ADAMGIDSATVRRDFSYFGELG 60 

Query: 61 RRGFGYDVXKLNINFFAEILNDHSTTNvMLVGCGNIGRALLHYRFHDRNKMQISMAFDLDS 120 

RRGFGYDV KLMNFFA++LNDHSTTNV+LVGCGNIGRALLHYRFHDRNKMQI+M FD D 
Sbjct: 61 RRGFGYDVTKLMNFFADLLNDHSTTNVILVGCGNIGRALLHYRFHDRNKMQIAMGFDTDD 120 

Query: 121 ISTOLVGKTTEDGIPVYGISTIITOHLIDSDIETAILTVPSTEAQEV'ADILVKAGIKGILSFS 180 

N LVG T D 1PV+GIS++ + + ++DIETAILTVPS AQEV D L++AGIKGILSF+ 
Sbjct: 121 NALVGTKTADNIPVI-IGISSVKERIANTDIETAILTVPSII-IAQEVTDQLIEAGIKGILSFA 1B0 

Query: 181 PVHLTLPKDI I VQYVDLTSELQTLLYFMNQ 210 

PVHL +PK +IVQ VDLTSELQTLLYFMNQ 
Sbjct: 181 PVHLQVPKGVIVQSVDLTSELQTLLYFMNQ 210 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1699 

A DNA sequence (GBSxl803) was identified in S.agalactiae <SEQ ID 5277> which encodes the amino 
acid sequence <SEQ ID 5278>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2966 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9847> which encodes amino acid sequence <SEQ ID 9848> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 24 PRERIiVDLGADRLSNQELLAI LLRTG I KEKPVL3 1 STQ I LENI SSLADFGQLSLQELQS I 83 

PRERL+ +GA+ L+N ELLAILLRTG K + VL++S ++L + L + S++EL SI 
Sbjct: 19 PRERLLKVGAENIANHELLAILLRTGTKHESVLDLSNRLLRSFDGLRLLKEASVEELSSI 78 
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Query: 84 Kt3IGQVKSVEIKAMLEIAKRIHKAEYERKEQILSSEQLARKMMLELGDKKQEHLVAIYMD 143 

GIG VK+++I A 4EB RIHK + I S E A +M ++ QEH V +Y++ 
Sbjct: 79 PGIG^^VIOiIQILARVELGSRIHKI)A^EHFVIRSPEDGANLVMEDMRFLTQEHFVCLYLN 138 

Query: 144 TQNRIIEQRTIFIGTVRRSVAEPREILHYACKIMATSLIIIHNHPSGSPKPSESDLSFTK 203 

T+N++I +RT+FIG++ S+ PRE+ A K A S I +HNHPSG P PS D+ T+ 
Sbjct: 139 TKNQV1HKRTVFIGSLNSSIVHPREVFKEAFICRSAASFICVHNHPSGDPTPSREDIEVTR 198 

Query: 204 KIKRSCDHLGIVCLDHIIVGKNKYYSFREE 233 

++ + +GI LDH+++G K+ S +E+ 
SbjCt: 199 RLFECGNLIGIELLDHLVIGDKKFVSLKEK 228 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5279> which encodes the amino acid 
sequence <SEQ ID 5280>. Analysis of this protein sequence reveals the following: 
Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3307 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/225 (64%) , Positives = 182/225 (80%) 



MYHIELKKEALLPRERLVaLGADKLSNQELLAILLRTGIKEKPVL2ISTQILENISSLAD 71 
MY 1+ +PRERL+ LGA+ LSNQELLAILLRTG KEK VLE+S+ +L ++ SLAD 

MYSIKCDDNKAMPRERLMRLGAESLSNQEL1AILLRTGNKEKHVLELSSYLLSHLDSIAD 60 







SbjCt: 


1 


Query: 


72 


SbjCt: 


61 




132 


SbjCt: 


121 


Query: 


192 


Sbjct: 


181 



' ++SLQELQ + GIG+VK++EIKAM+EL RI 



- +L+S Q+A KMM LGD 



5+AEPREIL+YACKNMATSLI+IHNHPSG+ 



3 +D FT+KIKRSC+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



45 Example 1700 

A DNA sequence (GBSxl804) was identified in S.agalactiae <SEQ ID 528 1> which encodes the amino 
acid sequence <SEQ ID 5282>. This protein is predicted to be a permease. Analysis of this protein 
sequence reveals the following: 



Possible site: 29 
50 »> Seems to have an uncleavable N- 

Likelihood = 
Likelihood = 
Likelihood = 
Likelihood - 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



:erm signal seq 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



175 - 192 



317 - 333 



255 - 290; 



170 - 194 



236 - 257! 



282 - 308; 
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■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4142 (Affirmative; 

- Certainty=0. 0000 (Not Clear) . 

- Certainty=0. 0000 (Not Clear) . 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05771 GB:AF051356 putative permease [Streptococcus mutans] 
Identities = 88/356 (24%) , Positives = 175/366 (47%) , Gaps = 27/356 (7%) 

FEKRQVYYWITFAICYAIQAYW GAVSNILTTIjHKAI F- PFLMGAGIAYI INIVMSV 58 

F+ ++++ + + I W G++ N ++ K F PFL+G + Yl N +++ 
FKSSKLFFWTVEILLVTLILFIWHQMGSIFNPFFSVAJCTFFLPFLLGGFLYYITNPIVTF 51 





3 


Sbj ct : 


2 


Query: 


59 


Sbjct: 


62 




119 


Sbjct: 


111 




179 


Sbjct: 


166 




235 


Sbjct: 


223 




295 


Sbj ct : 


283 


Query: 


355 


Sbjct: 


343 



SNPAFKNI DI PVLLKQFNLSYVDILTNVLDS VTVSVSS IVYMI 165 

• - IYVLANKEQLGRQFNLLIDTYLGSTGKTFHYVRHILHQRFHGFFVS 234 

Y+L +K+ L +L T L + + + +++ + 

ILFYLLKDKDGL-- -MPMLDRTILKNDRHNISQLLNQMNKTISRYISG 222 



T +IP VG Y+G+T 



- LQQ +GN4+YP+WG + 



I- +GG + G++GML+AVP A 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5283> which encodes the a 
sequence <SEQ ID 5284>. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



. uncleavable N-term signal seq 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



35 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



103 ( 83 - 

178 - 194 ( 166 - 2021 

278 - 294 ( 256 - 

299 - 315 ( 295 - 

14 - 30 ( 13 - 

340 - 356 ( 333 - 

258 - 274 ( 256 - 



- Certainty=0. 4482 (Affirmative) • 
■ Certainty=0. 0000 (Not Clear) < i 
• Certainty=0. 0000 (Not Clear) < ! 



The protein has homology with the following sequences in the databases: 



Query: 10 FEKKQVFYLVLTFILCYGILANMRNGTAIVTTIYKTS LPFFYGAAGAYIVNIVMSA 65 

F+ ++F+ + +h IL WR +1 + + LPF G YI N +++ 
Sbjct: 2 FKSSKLFFWTVEILLVTLILFIWRQMGSIFNPFFSVAKTFFLPFLLGGFLYYITNPIVTF 61 



Query: 66 YEKVYVYIFKDWSHVLKVKRGICLLLAYLTFFILITWIISIVIPDLITSISTLTKFDT-- 123 
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^ 1+ +IP+LI ++ L 



Sbjct: 


62 


Query: 


124 


Sbj ct : 


111 


Query: 
Sbj Ct : 


183 
159. 


Query: 


235 


Sbjct: 


216 




295 


Sbjct: 


276 


Query: 


355 


Sbjct: 





- +LTN+L SVTV 



SAIINLFISFVFSL YVIASKEDLCRQGNTLVDTYTGXYAKRIHYLLELLHQR 234 

S+I+ + + V L Y+L K+ L L T I LL +++ 

SSIVYMITNTVMILVLTPVILFYLLKDKDGLMPM LDRTILKNDRHNISQLLNQMNKT 215 



T +IP +G +G 



Y II II+++ LQQI+GN +YP+WG 4 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/370 (58%), Positives = 291/370 (77%) 

Query: 1 MKFEKRQVYYWITFAICYAIQAYWGAVSNILTTLHKAIFPFLMGAGIAYIINIVMSVYE 60 

MKFEK+QV+Y+V+TF +CY I A W + I+TT++K PF GA AYI+NIVMS YE 
Sbjct: 8 MKFEKKQVFYLVLTFILCYGIIANVTRNGTAIVTTIYK'TSLPFFYGAAGAYIVNIVMSAYE 67 

Query: 61 RLYIKLFKGSRLLMAIKRSVSMILSYATFIGLIVWLFSIVIPDLISSLSSLLVIDTGALA 120 

++ Y+ +FK ++ +KR + ++L+Y TF LI W+ SIVIPDLI+S+S+L DT + 
Sbjct: 68 KVYVYIFKDWSHVLKVKRGICLLLAYLTFFILITWIISIVIPDLITSISTLTKFDTITIQ 127 

Query: 121 KLVNftTLNENKQISEVLNYMGTDKDLVSTLSGYSQQIL^ 180 

++VNNL NK ++ + Y+G D L T++ YSQQ+LKQ L+VLTN+LTSV+ IA+ ++N 
Sbjct: 128 EVVNNLEHNKLLARTIQYIGGDGKLTETIANYSQQLLKQFLTVLTNILTSVTVIASAIIN 187 

Query: 181 VFVSFIFSIYVLAI<rKEQLGRQFNLLIDTYLGSTGKTFHYVRHILHQRFHGFFVSQrLEAM 240 

+F+SF+FS+YVLA4-KE h RQ N L+DTY G K HY+ +LHQRFHGFFVSQTLEAM 
Sbjct: 188 LFISFVFSLYVLASKEDLCRQGOTLVDTYTGKYAKRIHYLLELLHQRFHGFFVSQTLEAM 247 

Query: 241 ILGSLTVIGMLI FQFPYALTVGVLVAFTALI PWGAYIGVTIGFILIATESLTEAFLFVL 300 

ILGSLT GM I + P+A T4GVLVAFTALIPV+GA IG IGFIL1 T+S+++A +F++ 
Sbjct: 248 ILGSLTASGMFILRLPFAGTIGVLVAFTALIPVIGASIGAAIGFILIMTQSMSQAIIFII 307 

Query: 301 FLILLQQFEGWIYPEOWGGSIGLPSMWVLMAITIGGALWGILGMLLAVPVAATIYQIvK 360 

FLI+LQQ EGN IYPKWGGSIGLP+MWVLMAITIG +L GI+GM++AVP+AAT+YQ++K 
Sbjct: 308 FLIILQQIEGNFIYPKWGGSIGLPAMWVLMAITIGASLKGIVGMIIAVPLAATLYQVIK 367 

Query: 361 DHIIKRQTLR 370 

D+I KRQ ++ 
Sbjct: 368 DNIQKRQAIQ 377 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1701 

A DNA sequence (GBSxl805) was identified in S.agalactiae <SEQ ID 5285> which encodes the amino 
acid sequence <SEQ ID 5286>. Analysis of this protein sequence reveals the following: 
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Final Results 

bacterial cytoplasm Certainty=0 . 1081 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9849> which encodes amino acid sequence <SEQ ID 9850> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA69226 GB:U29579 6-phospho-beta-glucosidase [Escherichia coli] 
Identities = 290/478 (60%), Positives = 369/478 (76%), Gaps = 2/478 (0%) 



2 MVKQVFPKGFLWGGATAflNQCEGAYNVDGRGLANVDWPTGEDRFAIISGQKKMFDFEEG 61 

M VFP+ FLWGGA AANQ EGA+ +GL VD++P GE R A+ G +K F + 
1 MKMSVFPESFLWGGALAANQSEGAFREGDKGLTTVDMIPHGEHRMAVKLGLEKRFQLRDD 60 

62 YFYPAKESIDFYHHYKEDLALLAEMGFKTYRMSIAVITRIFPKGDELYPNEAGLQFYENIF 121 

FYP+ E+ DFYH YKED+AL+AEMGFK +R SIAW+R+FP+GDE+ PN+ G+ FY ++F 
61 EFYPSHEATDFYHRYKEDIALMAEMGFKVFRTSIAWSRLFPQGDEITPNQQGIAFYRSVF 120 

122 KECRKYGIEPLVTITHFDCPIYLIKHYGGWRSRKMIGFYERLvRALFTRFKGLVKYWLTF 181 

+EC+KYGIEPLVT+ HFD P++L+ YG WR+RK++ F+ R R F F GLVKYWLTF 
121 EECKKYGIEPLVTLCHFDVPMHLOTEYGSWRNRKLVEFFSRYARTCFEAFDGLVKYWLTF 180 

182 NEINMILHAPFMGAGLYFEDGENQEQIKYQAAHHELVASAIAVKIAHEVDPNNQIGCMLA 241 

NEIN++LH+PF GAGL FE+GENQ+Q+KYQAAEH+LVASA+A KIAHEV+P NQ+GCMLA 
181 NEINIMLHSPFSGAGLVFEEGENQDQVKYQAAHHQLVASALATKIAHEVNPQNQVGCMLA 240 



Query: 
Sb j ct ; 

Sbjct 

Sbjct 

Sbj ct 

Sbji 

: 3 02 LRDYTVrjFISFSYYSSRVASGNPTVSEQVQENIPASLKNPYLKSSEWGWQIDPLGLRITL 361 
L++ TVDF+SFSYY+SR AS + N+ SL+NPYL+ S+WGW IDPIX3LRIT+ 

Sbjct: 301 LKN-TVDFVSFSYYASRCASAENINANNSSAANWKSLRNPYLQVSDWGWGIDPLGLRITM 359 

Query: 362 NAIWDRYQKPMFIVENGLGAVDIPDENGYVEDDYRIDYLRQHIAAMRDAIYVDGVNLIGY 421 

N ++DRYQKP+F+VENGLGA D NG + DDYRI YLR+H1 AM +AI DG+ L+GY 
Sbjct: 360 NMMYDRYQKPLFLVENGLGAKDEFAANGEINDDYRI SYLREHIRAMGEAI -ADGIPLMGY 418 

Query: 422 TTWGCIDLVSAGTGEMEKRYGFIYVDRNNKGEGTLKRYKKKSFYWYKKVIASNGSQIE 479 

TTWGCIDLVSA TGEM KRYGF++VDR-I-+ G GTL R +KKSF+WYKKVTASNG +E 
Sbjct: 419 TTWGCIDLVSASTGEMSKRYGFVFVDRDDAGNGTLTRTRKKSFWWYKKVIASNGEDLE 476 

There is also homology to SEQ ID 5288. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



G +YP +C P+D WA+++K+REN FFIDVQARG YP Y+ H 



Example 1702 

A DNA sequence (GBSxl806) was identified in S.agalactiae <SEQ ID 5289> which encodes the amino 
acid sequence <SEQ ID 5290>. This protein is predicted to be platelet-activating factor acetylhydrolase 
isoform lb beta subunit, pu. Analysis of this protein sequence reveals the following: 

■> N-terminal signal sequence 



■ Final Results 

bacterial cytoplasm Certainty=0. 5323 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) suco 
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The protein has homology with the following sequences in the GENPEPT database. 

- >GP:AAC27974 GB :AF01604B platelet-activating factor acetylhydrolase 
alpha 2 subunit [Rattus norvegicus] 
Identities = 43/177 (24%), Positives = 84/177 (47%), Gaps = 9/177 (5%) 

Query: 28 QEGAIVFTGDSIVEF FPLKKHLGRDYPLVNRGVAGSDTYWLLENLRTQVVffiLLPSKV 84 

+E ++F GDS+V+ + + + L +N G+ G T +L L+ E + KV 

sbjct: 38 kepdvlfvgdswqlmqqyeiwrelfsplhaimfg:ggdttpjwlwrlkngelenikpkv 97 

Query: 85 FIL-IGTMDIGLGHSQSEIIANITDIIAEIRAESYMTEINILSVLPVSEEDDYIERVKVR 143 

++ +GTN+ ++ E+ I 1+ I +1 +L +LP E+ + + + + 

Sbjct: 98 IVVWVGTHNHE--NTAEEVAGGIEAIVQLINTRQPQAKIIVLGLLPRGEKPNPLRQKNAK 155 

Query: 144 NNQTIKALNKTLSVISGINYIELYDLLVDEKGQLASSFTKDGLHLTDQAYAKISETI 200 

NQ +K +L ++ + +++ V G ++ D LHLT YAKI 4 + 

Sbjct: 156 VNQLLKV SLPKLANVQLLDIDGGFVHSDGAISCHDMFDFLHLTGGGYAKICKPL 209 

A related DNA sequence was identified in S. pyogenes <SEQ ID 529 1> which encodes t 
sequence <SEQ ID 5292>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5979 (Affirmative) < suco 

25 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/204 (45%) , Positives = 133/204 (65%) 

30 

Query: 1 mlevidkalrdyqmkreqffeinnqtvqegaivftgdsiveffplkkhlgrdyplvnrgv 60 

MLE++ + LR YQ ++ + NQ +G IVF GDS++EFFPLKK G P++NRG+ 
Sbjct: 1 MLEIVSEELRHYQEQKLIEYRNKNQLAPKGGIVFAGDSLIEFFPLKKAFGSCLPIINRGI 60 

35 Query: 61 AGSDTYWLLENLRTQVWELLPSKVFILIGTNDIGLGHSQSEIIANITDIIAEIRAESYMT 120 

AG D+ WLL + Q+ +L P +F+LIG NDIGLG+ + 1+ I ++I++IR+ + 
Sbjct: 61 AGIDSQWLLRHFSVQITDLEPKHIFLLIGCNDIGLGYDKCHIVKTIVELISQIRSHCVYS 120 

Query: 121 EINILSVLPVSEEDDYIERVKVRNNQTIKALNKTLSVISGINYIELYDLLVDEKGQLASS 180 
4Q +1 +I1S+LPVS Y + VK+R N I A+NK L++I + +1 L L DEKG L+ 

Sbjct: 121 QIYLLSLLPVSNNPRYQKTVKIRTNAMIDAINKDIAMIPTVEFINLNTCLKDEKGGLSDE 180 

Query: 181 FTKDGLHLTDQAYAKI SETI KLYL 204 
T DGLHL AYAK++E IK Y+ 
45 Sbjct: 1B1 NTLDGLHIiNFPAYAKLAEI I KSYI 204 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1703 

50 A DNA sequence (GBSxl807) was identified in S.agalactiae <SEQ ID 5293> which encodes (he amino 
acid sequence <SEQ ID 5294>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 . 5226 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 985 1> which encodes amino acid sequence <SEQ ID 9852> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA35556 GB:DS0723 Hypothetical 30.2 kd protein in idh-deoR 
intergenic region. [Escherichia coli] 
Identities = 104/265 (39%), Positives = 154/265 (57%), Gaps = 4/265 (1%) 

IKLIATDMDGTFLRSDKTYDICARFSSLLTLMEKyDIKFVAASGNLyDQLLLl\[FLEYPNRI 61 
IKLIA DMDGTFL KTY++ RF + M+ I+FV ASGN Y QL+ F E N I 







Sbjct: 


4 






Sbjct: 


64 




122 


Sbjct: 


122 


Query: 


182 


Sbjct: 


180 




242 


Sbjct: 


240 



A+VAENGG V+ 4 



Y D+ L+D +F+ L +++ L+ - 



3 G ID4+ V+KA+G+ L + WG+ +V+VFGDGGND+EMLR A S+AM NA 



- A AKY+ SN+++GVL+ 1+ L 
ATAAAKYRAGSNNREGVLDVIDKVL 264 

There is also homology to SEQ ID 1158. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1704 

A DNA sequence (GBSxl808) was identified in S.agalactiae <SEQ ID 5295> which encodes the amino 
acid sequence <SEQ ID 5296>. This protein is predicted to be transcriptional regulator (AraC/XylSfamily). 
Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 4984 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 5 DNLLSHNLEDNRHLLPYEHMHTEVRNGYPDILFHWHPELEISYVHEGTARYHIDYDFFNS 64 

D H + + LLPY4 T + NGYPD LFHWHPELEISY++EGTA+YHIDYD+FNS 
Sbjct: 10 DENFKHEINFDNDLLPYKIYQTTIANGYPDTLFEWHPELEISYIYEGTAQYHIDYDYFNS 69 

Query: 65 QSGDIILIRPNGMHSIHPIENKEHITDSIKFHLDLIGYSIVDQVSLRYLQPLQTSSFKFI 124 

Q+ DIIL+RPNGMHSIHPI+NK ++ FHLDL+GYS++DQ+SLRYLQPLQ S+FK + 

Sbjct: 70 QTDDIILWPNGMHSIHPIKNKMQKAOTIiLFHLDLVGYSLLDQISLRYLQPLQNSTFKIjV 129 

Query: 125 QCIKPSMTGYNDIKNCLFDIFNISKEENRHFELLLKAKLNELLYLLYYHQYVIKKHTDDT 184 

CIKP M GY DIKNCLF IF+I + + RHFELLLKAKL EL+ YI1I1Y+HQYV+ +KH+DD 
Sbjct: 130 PCIKPDMLGYQDI KNCLFAI FDIYQRQGRHFELLLJCAKLQELIYLLYFHQYVLRKHSDDM 189 
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Query: 185 YRKJffiRIRDLIDYIN^YQQNLTIEFLMJYI^GySKTKFHTVFKQHTGTSCTEFIIQVRLN 244 

YRKNE+IR+LIDYI+ +YQ+ L+I IAD +GYSKTHFMTVFKQHTGTSCT+FI IQ RL+ 
Sbjct: 190 YRKNEKIRELIDYIHQHYQEKLSIISLADIIGYSKTHFMTVFKQHTGTSCTDFIIQFRLS 249 

Query: 245 KASEHLINSTTAIIDIANSVGFjNNLSNFNRQFKRYYHTTPRQYRRQF 291 

KA + L+NS I+++A4- VGF NLSNFNRQFKRYY TP QYRRQF 
Sbjct: 250 KACDLLVNSIKPILEVASEVGFTNLSNFNRQFECRYYQITPSQYRKQF 296 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5297> which encodes the amino acid 
sequence <SEQ ID 5298>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0. 1000 (Affirmative) < suco 

bacterial membrane Certainty^ 0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 An alignment of the GAS and GBS proteins is shown helow. 

Identities = 43/169 (25%) , Positives = 83/169 (48%) , Gaps = 16/159 (9%) 

Query: 136 DIKNCLFDIFNISKEENRHFELLLKAKHJELLYLLYYHQYV IKKHTDDTYRKN- 188 

D+K+ F +F+ 4- R F +L K ++ ++ Q + +KK D T + N 

25 Sbjct: 319 DVKHVSFLLFS DIYRQFPILDKMTYLSMVKTIHDSQSIDCILRELKKVLDVTNQNNS 375 

Query: 189 ERIRDLIDYINNNYQQNLTIEFLADYMGYSKTHFMTVFKQHTGTSCTEFIIQW 242 

+ + + ID I Y Q LT++ +AD + + + FK T S T+-I-+ VR 
Sbjct: 376 PEKRYSDLVSETIDCIRKEYHQELTLKAIADRLHWGVYLGQCFKNETERSFTOYLNHVR 435 

30 

Query: 243 LNKASEHLINSTTAIIDIANSVGFNNLSNFNRQFKRYYHTTPRQYRKQF 291 

+ KA + L+ + +1 +IA G+N F + FK+ +P+H-+R ++ 
Sbjct: 436 I QKAQQLLLYTNQS INEIAYETGYNTNHYFI KMFKKLNGLS PKEFRDRY 484 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1705 

A DNA sequence (GBSxl809) was identified in S.agalactiae <SEQ ID 5299> which encodes the amino 
acid sequence <SEQ ID 5300>. Analysis of this protein sequence reveals the following: 

40 Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3705 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

50 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1706 

A DNA sequence (GBSxl810) was identified in S.agalactiae <SEQ ID 5301> which encodes the a 
acid sequence <SEQ ID 5302>. Analysis of this protein sequence reveals the following: 

Possible site: 39 



5 


»> Seems to have no N-terminal signal sequence 












INTEGRAL 


Likelihood =-11.25 


Transmembrane 


59 


75 


56 


82) 




INTEGRAL 


Likelihood = -7.48 


Transmembrane 


23 


39 


12 


41) 




INTEGRAL 


Likelihood = -6.64 


Transmembrane 


231 


247 


225 


255) 




INTEGRAL 


Likelihood = -5.15 


Transmembrane 


335 


351 


333 


355) 


10 


INTEGRAL 


Likelihood = -4.19 


Transmembrane 


309 


325 


305 


327) 




INTEGRAL 


Likelihood = -4.14 


Transmembrane 


272 


288 


268 


292) 




INTEGRAL 


Likelihood = -4.04 


Transmembrane 


402 




400 


419) 




INTEGRAL 


Likelihood = -3.88 


Transmembrane 


191 


207 




208) 




INTEGRAL 


Likelihood = -2.71 


Transmembrane 


365 


381 


364 


381) 


15 


INTEGRAL 


Likelihood = -1.86 


Transmembrane 


165 


181 


( 164 


182) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 5501 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < : 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF96429 GB:AE004383 conserved hypothetical protein [Vibrio cholerae] 
Identities = 142/443 (32%) , Positives = 241/443 (54%) , Gaps = 20/443 (4%) 

Query: 6 NEFQFSLESILGFVWRGIWGIilAGFWSIFRLAIEKIFLWMELYKS - -AHYQPIILLS 63 

N+F +4 ++ ++VG++AG V + F A+ + + KS + P+ L + 

Sbjct: 21 NQFLSKDKTPFSVLFLSLLVGILAGLVGTYFEQAVHLVSETRTDWLKSEIGSFLPLWLAA 80 

Query: 64 ITVTSIIAAVIIGFFI--KSDPDIKGSGIPHVEGELKGMLSPDWFSIVWKKFIAGILAIS 121 

+++ + A IG+F+ + P+ GSGIP +EG + GM W+ ++ KF G+ A+ 
Sbjct: 81 FLISAFLA- - FIGYFLVHRFAPEAAGSGI PEIEGAMDGMRPVRWWRVLPVKFFGGMGALG 138 

Query: 122 SGLMLGREGPSIQLGAMTGKGIAQYMASRMEKR-VLIASGAAAGLSAAFNAPIAGLLFV 180 

SG+4-LGREGP++Q+G G+ 1+ + R L+A+GAA GL+AAFNAP+AG++FV 

Sbjct: 139 SG>WLGREGPWQMGGAVGRMISDIFRVKNEDTRHSLLAAGAAGGLAAAFNAPLAGIMFV 198 



Query: 181 VEEIYHHFS- 



7-ANFVSLNIFGLTPVLALPSELPSLNLNFYWIFLLMG 238 
V I G V+ +P + + L+ +FLL+G 
Sbjct: 199 I EEMRPQFRYTL I SVRAVI I SAVAANI VFRVINGQDAVI TMP - QYDAPELSTLGLFLLLG 257 



Query: 239 LFLGILGFIYEWVIL-- 



- -RFHVIYDYLGKLFHLPSHLYGILAVIF1LPIGYYFPQLLGG 294 
F + K + L +G ++L Y P+L GG 
Sbjct: 258 ALFGVFGVLFNYLITLAQDLFVKFHRNDRKRYLLTGSMIGGCFGLLLL YVPELTGG 313 

Query: 295 GNGLIVSLPRSNLSLMMLGLFFLIRFLWSMLSYSSGLPGGIFLPILALGSLAG-AFFAVG 353 

G LI ++ +L L F+ R ++L + SG PGGIF P+LALG+L G AF + 

Sbjct: 314 GISLIPTITNGGYGAGILLLLFVGRIFTTLLCFGSGAPGGIFAPMLALGTLFGYAFGLIA 373 

Query: 354 MQYFGIISHQQISLFVVLGMAGYFGAISKAPLTAMILVTEMVGDLKQLMAIGIVTMVSYI 413 

+F ++ + +F + GM FA +AP+T ++LV EM + ++ + I ++ + I 
Sbjct: 374 KMWFPELNI EP - GMFAI AGMGALFAATVRAPITGILLVIEMTNNYHLI LPLI ITSLGAVI 432 

Query: 414 VMDLLKGEPIYEAMLAKMTFNPK 436 

LL G+PIY +L + N K 
Sbjct: 433 FAQLLGGQPIYSQLLHRTLKNQK 455 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5303> which 
e <SEQ ID 5304>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.68 Transmembrane 71 - 87 ( 66 - 9! 
INTEGRAL Likelihood = -9.45 Transmembrane 36 - 52 ( 26 - 5S 



encodes the amino acid 
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INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



-5 . 63 Transmembrane 

-5.36 Transmembrane 

-5.15 Transmembrane 

-5.10 Transrr.embr ane 

-4.19 Transmembrane 

-4.19 Transmembrane 

-4.19 Transmembrane 

-1.86 Transmembrane 



346 - 362 ( 342 - 367) 

376 - 392 ( 375 - 393) 

413 - 429 ( 410 - 432) 

321 - 337 ( 318 - 340) 

203 - 219 ( 202 - 220) 

244 - 260 ( 242 - 265) 

284 - 300 ( 280 - 304) 

177 - 193 ( 176 - 194) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5670 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the 

>GP:AAF96429 GB:AE004383 conserved hypothetical protein [Vibrio cholerae] 
Identities = 144/442 (32%) , Positives = 236/442 (52%) , Gaps = 30/442 (6%) 



Query: 18 NEFTFSNKSIIAYVWRGVWGIIAGVIVSLFRLLIEWADWIEWYRYAHINSLLLLPIL 77 
N+F 4K+ + ++ ++VGI+AG++ + F 4 + ++ +W++ISLL+ 
. Sbjct: 21 NQFLSKDKTPFSVLFLSLLVGILAGLVGTYFEQAVHLVSETRTDWLK-SEIGSFLPLWLA 79 

Query: 78 SVSLLAVL- WGFLV--KSDSDIKGSGIPHWGELKGLMSPDWWSVLWKKFLGGIMAISM 134 

+ + A L F+G+ + + + GSGIP +EG + G+ WW VL KF GG+ A+ 
Sbjct: 80 AFLISAFIiAFIGYFLVHRFAPEAAGSGIPEIEGAMDGMRPVRWWRVLPVKFFGGMGALGS 139 

Query: 135 GFMLGREGPSIQLGAMSAKGLAKFLKSSRLEKR-VLIASGAAAGLSAAFNAPIAGLLFW 193 

G +LGREGP++Q+G + ++ + + R L+A+GAA GL+AAFNAP+AG++FV+ 
Sbjct: 140 GMVLGREGPTVQMGGAVGRMISDIFRVKNEDTRHSLliAAGftAGGLAAAFNAPLAGIMFVI 199 

Query: 194 EEIYHHFS-RLIWITALVASLV-ANFISLNIFGLKPVLAMSEAMPFLGLNQYWLLLLLGL 251 

EE+ F LI + A++ S V AN + I G V+ M + L+ L LLLG 

Sbjct: 200 EEMRPQFRYTLISVRAVIISAVAANIVFRVINGQDAVITMPQ-YDAPELSTLGLFLLLGA 258 

Query: 252 FLGCLGYLYEIVIL NFNKLYVILGSWLHLPDYFYGIIMVFLILPIGYYL 300 

G G L+ +1 N K Y++ GS + +G++++ Y+ 
Sbjct: 259 LFGVFGVLFNYLITLAQDLFVKFHRNDRKRYLLTGSMI GGCFGLLLL YV 307 

Query: 301 PQLLGGGHGLILSLSNQQLPLMTIFFYFIIRFIVSMFSYGSGLPGGIFLPILTLGALAGL 360 

P+L GGG LI +++N + F+ R ++ +GSG PGGIF P+L LG L G 

Sbjct: 308 PELTGGGI SLI PTITNGGYGAGILLLLFVGRIFTTLLCFGSGAPGGI FAPMLALGTLFGY 367 

Query: 361 LFGQIASQLGLLNQSFLSLFLILGMAGYFAAISKAPLTGMILVTEMVGDLKPLMAIAWT 420 

FG IA +F I GM FAA +AP+TG++LV EM + ++ + + + 

Sbjct: 368 AFGLIAKMWFPELNIEPGMFAIAGMGALFAAT\'RAPITGILLVIEMTNNYHLILPLIITS 427 

Query: 421 FVSYLVMDLLNGQPIYEAMLDK 442 

+ + LL GQPIY +L + 
Sbjct: 428 LGAVI FAQLLGGQPI YSQLLHR 449 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 343/510 (67%) , Positives = 410/510 (80%) 



Query: 1 MEIfflKNEFQFSLESILGFVWRGIVVGLIAGFWSIFRLAIEKIFLVvMELYKSAHYQPII SO 

MENHKNEF FS +SI+ + VWRG+WG+ IAG +VS+FRL IE V+E Y+ AH ++ 

Sbjct: 13 MENHKNEFTFSNKSIIAYVWRGVVVGIIAGVIVSLFRLLIEVTADWIEWYRYAHINSLL 72 

Query: 61 LLSITVTSIIAAVIIGFFIKSDPDIKGSGIPHVEGELKGMLSPDWFSIVWKKFIAGILAI 120 

LL I S++A + +GF +KSD DIKGSGIPHVEGELKG++SPDW+S++WKKF+ GI+AI 
Sbjct: 73 LLPILSVSLLAVLFVGFLVKSDSDIKGSGIPHVEGELKGLKSPDWWSVLWKKFLGGIMAI 132 

Query: 121 SSGLMLGREGPSIQLGAMTGKGIAQYLNASRMEKRVLIASGAAAGLSAAFNAPIAGLLFV 180 

S G MLGREGPSIQLGAM+ KG+A++L +SR+EKRVLIASGAAAGLSAAFNAPIAGLLFV 
Sbjct: 133 SMGFMLGREGPSIQLGAMSAKGLAKFLKSSRLEKRVLIASGAAAGLSAAFNAPIAGLLFV 192 



Query: 181 VEEIYHHFSRLWITALVASLVANFVSLNIFGLTPVLALPSELPSLNLNFYWIFLLMGLF 240 
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VEEIYHHFSRL+WITALVASLVANF+SLNIFGL PVLA+ +P L LN YW+ LL+GLF 
Sbjct: 193 VEEIYHHFSRLIWITALVASLVANFISLNIFGLKPVLAMSEAMPFLGLNQYWLLLLLGLF 252 

Query: 241 LGII£FIYEWILRFHVIYDYLGKLFHLPSHLYGILAVIFILPIGYYFPQLLGGGNGLIV 300 
5 LG LG++YE Vlh F+ +Y LG HLP + YGI+ V ILPIGYY PQLLGGG+GLI+ 

Sbjct: 253 LGCLGYLYEIVILNFNKLWILGSWLHLPDYFYGIIMVFLILPIGYYLPQLLGGGHGLIL 312 



Query: 301 SLPRSNLSLMMLGLFFLIRFLWSMLSYSSGLPGGIFLPILALGSLAGAFFAVGMQYFGII 360 

SL L LM + +F+1RF+ SM SY SGLPGGIFLPIL LG+LAG F G++ 
Sbjct: 313 SLSNQQLPLMTIFFYFIIRFIVSMFSYGSGLPGGIFLPILTLGALAGLLFGQIASQLGLL 372 

Query: 361 SHQQISLFVVLG^GYFGAISKAPLTAMILVTEWGDLKQLMAIGIVTMVSYIVMDLLKG 420 

+ +SLF++LGMAGYF AISKAPLT MILVTEMVGDLK LMAI +VT VSY+VMDLL G 
Sbjct: 373 NQSFLSLFLILGMAGYFAAISKAPLTGMILVTEMVGDLKPLMAIAWTFVSYLVMDLLNG 432 



+PIYEAML KM ++ PTLIELTV DKI +GKYV+ + L+LPENVL I TTQIHH+ S V 

Sbjct: 433 QPIYFAMLDKMAMKHPTNLVEPTLIELTVGDKIAGKYVKELKLPENVLITTQIHHQKSQV 492 

Query: 481 VSGNTILNAGDTIFLWNESEIKEVREQLM 510 

VSGNT L +G TIFLWNE++ VRE LM 
Sbjct: 493 VSGNTRLLSGAT I FLWNEADTGFVREVLM 522 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1707 

A DNA sequence (GBSxl811) was identified in S.agalactiae <SEQ ID 5305> which encodes the amino 

acid sequence <SEQ ID 5306>. This protein is predicted to be spermidine/putrescine-binding periplasmic 

protein precursor (potD-1). Analysis of this protein sequence reveals the following: 

30 Possible site: 38 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.02 Transmembrane 20 - 36 ( 14 - 40) 

Final Results 

35 bacterial membrane Certainty=0 . 4609 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 888 1> which encodes amino acid sequence <SEQ ID 8882> 
40 was also identified. Analysis of this protein sequence reveals the following: ' 

Lipop: Possible site: -1 Crend: 2 
SRCFLG : 0 

McG: Length of UR: 22 

Peak Value of UR: 4.16 
45 Net Charge of CR: 2 

McG: Discrim Score: 18.94 
GvH: Signal Score (-7.5): -3.29 

Possible site: 25 
»> Seems to have an uncleavable N-term signal seq 
50 ' Amino Acid Composition: calculated from 1 

ALOM program count: 1 value: -9.02 threshold: 0.0 

INTEGRAL Likelihood = -9.02 Transmembrane 7 - 23 ( 1-27) 
PERIPHERAL Likelihood = 6.05 170 
modified ALOM score: 2.30 
55 icml HYPID: 7 CFP: 0.461 



*** Reasoning Step: 3 



60 



Final Results 

bacterial membrane Certainty=0. 4609 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < su.cc> 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF945B1 GB:AE004221 spermidine/putrescine ABC transporter, 

periplasmic spermidine/putrescine-binding protein [Vibrio cholerae] 
Identities = 126/327 (38%), Positives = 196/327 (59%), Gaps = 2/327 (0%) 

Query: 42 SSSTPNSDKLVIYNWGDYIDPALLKKFTKETGIEVQyETFDSNEAMHTKIKQGGTTYDIA 101 

+++ +L YNW +YI +L+ FTKETGI+V Y T++SNE+M+ K+K G YD+ 

Sbjct: 18 TNAMAKDQELYFYNWSEYIPSEVLEDFTKETGIKVIYSTYESNESMYAK1,KTQGAGYDLV 77 

Query: 102 VPSDYMIDKMIKENLLVICLDHSKIANWDAIGARFKNLSFDPKNKYSIPYFWGTVGIVYN- 160 

VPS Y + KM KE +L ++DHSK++++ + + N FDP NK+SIPY WG GI N 
Sbjct: 78 VPSTYFVSKMRKEGMLQEIDHSKLSHFKDLDPNYLNKPFDPGNKFSIPYIWGATGIGINT 137 

Query: 161 DQLVKTPPKHWDDLVffiPEFRNKIMLVDSAREVIGVGLNSLGYGH 220 

D L K K+W DLW ++ ++ML+D AREV + L+ LGY NT N E+KAA ++L 
Sbjct: 138 DMLDKKSLKIWGDLWDAKWAGQI^LMDDAREVFHIALSKLGYSPNTTNPKEIKAAYRELK 197 

Query: 221 ALTPNVKAIVADEMKGYMIQGDAAIGVTFSGEAREMLDGNKHLHYWPSEGSNLWFDNIV 280 

L PNV +D + G+ ++G+ ++G A + + P +G+ W D+I 

Sbjct: 198 laMPNVLVFNSDFPANPYLAGEVSLGmVINGSAYMARQEGAPIQIIWPEKGTIFVIMDSIS 257 

Query: 281 IPKTVKHRKEAYAFINFMMEPKNAAQNAEYIGYATPNLKAKALLPADIKNDKAFYPPDKT 340 

IP K+ + A+ I+F++ P+NAA+ A IGY TP A LLP + ND + YPP 
Sbjct: 258 IPAGAKNIEAAHKMIDFLLRPENAAKIALEIGYPTPVKTAHDLLPKEFANDPSIYPPQSV 317 

Query: 341 IDHLEVYNNLGQKWLGIYNDLYLQFKM 367 

ID+ E + +G+ + +Y++ + + K+ 
Sbjct: 318 IDNGEWQDEVGEASV-LYDEYFQKLKV 343 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5307> which encodes the amino acid 
sequence <SEQ ID 5308>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -8.44 Transmembrane 8 - 24 ( 1-27) 



Final Results 

bacterial membrane Certainty=0 .4376 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AA.C74207 GB:AE000212 spermidine/putrescine periplasmic transport 
protein [Escherichia coli] 
Identities = 134/342 (39%) , Positives = 199/342 (58%) , Gaps = 3/342 (0%) 



Query: 


17 


Sbjct: 






77 


Sbjct: 


68 


Query: 


136 


Sbjct: 


128 




195 


Sbjct: 


188 



PY WG IN TO Kt W DLW+PEYK S++L D ARE+ 



Query: 255 PSEGSNLWFDNLVLPKTMKHEKEAYAF1JSFINRPENAAQNAAYIGYATPNKKAKALLPDE 314 
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P EG W D+L +P K+++ A +NF+ RP+ A Q A IGY TPN A+ LL E 
Sbjct: 248' PKEGGIFWMDSLftlPANAKNKEGALKLINFLLRPDVAKQVAETIGYPTPNIAARKIiLSPE 307 

Query: 315 IKNDPAFYPTDDIIKKLEVYDNLGSRWLGIYNDLYLQFKMYR 356 
5 + ND YP + IK E +++G4 IY + Y + K R 

Sbjct: 308 VANDKTLYPDAETI KNGEWQNDVGAA- SSIYEEYYQKLKAGR 348 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 270/357 (75%) , Positives = 306/357 (85%) 

10 

Query: 14 MRRVYSFLGGIVLVILILFGLTTYLEKKSSSTP^SDKLVIYNVJGDYIDPALLKKFTKETG 73 

MR++YSFL G++ VI + IL L+ L+KKS S SDKLVIYNWGDYIDPALLKKFTKETG 
Sbjct: 1 MRKLYSFLAGVLGVIVILTSLSFILQKKSGSGSQSDKLVIYNWGDYIDPALLKKFTKETG 60 

15 Query: 74 IEVQYETFDSNEAMHTKIKQGGTTYDIAVPSDYMIDKMIKENLIiVKLDHSKIANWDAIGA 133 

IEVQYETFDSNEAM+TKI KQGGTTYDIAVPSDY IDKMIKENLL KLD SK+ D IG 
Sbjct: 51 IEVQYETFDSNEAMYTKIKQGGTTYDIAVPSDYTIDKMIKENLLNKLDKSKLVGMDNIGK 120 

Query: 134 RFKNLSFDPKNKYS I PYFWGWGI VYNDQLVKTPPKHWDDLWRPEFRNKIMLVDSAREVI 193 
20 F SFDP+N YS+PYFWGTVGI VYNDQLV P HW+DLWRPE++N IML+D ARE++ 

Sbjct: 121 EFLGKSFDPQOTDYSLPYFWGTOGIVYimQLVDKAPMHWEDr.WRPEYKNSIMLIDGAREML 180 

Query: 194 GVGLNSLGYGLNTKNISELKAASKKLDALTPNVKAIVADEMKGYMIQGDAAIGVTFSGEA 253 
GVGL + GY +N+KN+ +L+AA +KL LTPNVKA1VADEMKGYMIQGDAAIG+TFSGEA 
25 Sbjct: 181 GVGLTTFGYSVNSKNLEQLQAAERKLQQLTPNVKAIVADEMKGYMIQGDAAIGITFSGEA 240 

Query: 254 REMLDGNKHLHYWPSEGSNLWFDNIVIPKTVKHRKEAYAFINFMMEPKNAAQNAEYIGY 313 

EMLD N+HLHY+VPSEGSNLWFDN+V+PKT+KH KEAYAF+NF+ P+NAAQNA YIGY 
Sbjct: 241 SEMLDSNEHLHYIVPSEGSl^WFDNLVLPKTMKHEKEAYAFLNFINRPENAAQNAAYIGY 300 

30 

Query: 314 ATPNLKAKALLPADIKNDKAFYPPDKTIDHLEVYNNLGQKWLGIYNDLYLQFKMYRK 370 

ATPN KAKALLP +IKND AFYP D I LEVY+NLG +WLGIYNDLYLQFKMYRK 
Sbjct: 301 ATPNKKAKALLPDEIKNDPAFYPTDDIIKKIiEVYDNLGSRWLGIYNDLYLQFKMYRK 357 

35 SEQ ID 8882 (GBS 135) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 35 (lane 6; MW 40kDa). 

GBS135-His was purified as shown in Figure 201, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1708 

A DNA sequence (GBSxl812) was identified in S.agalactiae <SEQ ID 5309> which encodes the amino 
acid sequence <SEQ ID 5310>. This protein is predicted to be spermidine/putrescine ABC transporter, 
permease protein (potC). Analysis of this protein sequence reveals the following: 

Possible site: 51 
45 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 17 - 33 ( 10 - 

INTEGRAL Likelihood = -8.65 Transmembrane 236 - 252 ( 232 - 259! 

INTEGRAL Likelihood = -7.75 Transmembrane 137 - 153 ( 132 - 158! 

INTEGRAL Likelihood = -7.17 Transmembrane 63- 79 ( 60- 

50 INTEGRAL Likelihood = -6.32 Transmembrane 108 - 124 ( 107 - 

Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) • 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
55 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < ; 
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A related GBS nucleic acid sequence <SEQ ID 8883> which encodes amino acid sequence <SEQ ID 8 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
SRCFLG: 0 

McG: Length of UR: 26 

Peak Value of TJR: 3.65 

Net Charge of CR: 2 
McG: Discrim Score: 16.58 
GvH: Signal Score (-7.5): -6.17 

Possible site: 43 
>>> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 4 value: -12.05 threshold: 0.0 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =■ 
Likelihood = 
Likelihood = 
Likelihood = 



PERIPHERAL Likelihood = 
modified ALOM score: 2.91 
icml HYPID: 7 CFP: 0.582 

*** Reasoning Step: 3 



Transmembrane 



- Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5819 (Affirmative) 

- Certainty=0. 0000 (Not Clear) < 
• Certainty=0. 0000 (Not Clear) < 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB91527 GB:AE001165 spermidine/putrescine ABC transporter, 
permease protein (potC) [Borrelia burgdorferi] 
Identities - 97/249 (38%), Positives = 159/249 (62%), Gaps = 3/249 (1%) 



Sbj ct : 



KKFANIYLALVFIILYIPIIYLIFYSFNKGGDMNSFTGFTFSHYGELFQDSRLMLILVQT 69 
+ F NI+L L+ +Y+PII LI' YSFN G + GF+ Y E+F S++ + T 

RAFKNIFLFLILSFI YLPI I ILI I YSFNSGDSGFIWQGFSLKWYKEI FASSQI KSAI FTCT 62 



Query: 70 FFLAFLSALLATIIGTFGAIWIYQVRRRH-QTSILSLNNILLVAPDVMIGASFLLVFTVI 128 

+A +S+L + +IG GA IY+ + +T +LS+N I ++ PD++ G S + ++ I 
Sbjct: 63 ILIAI ISSLTSWIGI IGAYAIYKSENKKLKTILLSVNKITI INPDIVTGISLMTFYSAI 122 

Query: 129 GLQLGFTSVLLSHVAFSIPIVVLMVLPRLKENINDDMINASYDLGASTWQMLKEVMLPYLS 188 

+QLGF+++L+SH+ FS P W+++LP+L + ++I+A+ DLGAS Q+ ++ P ++ 
Sbjct: 123 KMQLGFSTMLISHI IFSTPYWI I ILPKLYSLPKNIIDAAKDLGASEIQIFFNI IYPEIA 182 

Query: 189 SGIISGFFMAFTYSLDDFAVTFFVTGNGFSTLSVEIYSRARRGISLEINALSTIVF--LF 246 

I +G +AFT S+DDF ++FF TG GF+ LS+ I S +RGI INA+S I+F + 
Sbjct: 183 GSIATGALIAFTLSIDDFLISFFTTGQGFNNLSILINSLTKRGIKPVINAISAILFFTIL 242 

Query: 247 SILLVIGYY 255 

S+L +1 + 
Sbjct: 243 SLLFIINKF 251 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5311> which 
sequence <SEQ ID 5312>. Analysis of this protein sequence reveals the following: 



encodes the amino acid 



Possible site: 49 
•> Seems to have an uncleavable N- 

INTEGRAL Likelihood = -8.17 

INTEGRAL Likelihood = -8.12 

INTEGRAL Likelihood = -7.91 

INTEGRAL Likelihood = -7.06 

INTEGRAL Likelihood = -3.93 



:erm signal seq 

Transmembrane 9 - 25 i 

Transmembrane 228 - 244 I 

Transmembrane 129 - 145 ( 

Transmembrane 62 - 78 i 

Transmembrane loo - 116 i 
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Final Results 

bacterial membrane Certainty=0 .4270 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB91527 GB:AE001165 spermidine/putrescine ABC transporter, 
permease protein (potC) [Borrelia burgdorferi] 
Identities = 91/249 (36%) , Positives = 154/249 (61%) , Gaps = 3/249 (1%) 

KKFANLYIASVFVLLYIPIFYLIFYSFNKGGDt^NQFrGFTLEHYQTMFEDSRLMTILLQT 61 
+ F N++L + +Y+PI LI YSFN G + GF+L+ Y+ +F S++ + + T 

RAFKNIFLFLILSFIYLPIIILIIYSFNSGDSGFIWQGFSLKWYKEIFASSQIKSAIFNT 62 



K + +LS N 4 





2 


Sbjct: 


3 


Query: 


62 


Sbj ct : 


63 




121 


Sbjct: 


123 


Query: 


181 


Sbjct: 


183 




239 


Sbjct: 


243 



K QLG S++L+SHI FS P W+++LP+L 



I G +AFT S+DDF ++FF TG LS+ I S ++GI INA+S I+FF 



SLLFIINKF 251 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 196/258 (75%) , Positives = 231/258 (88%) 

MK3CFANIYLALVFIILYIPIIYLIFYSFNKGGDMNSFTGFTFSHYGELFQDSRLMLILVQ 6 
MKKFAN+YLA VF++LYIPI YLIFYSFNKGGDMN FTGFT HY +F+DSRLM IL+Q 
MKKFANLYLASVFVLLYI PIFYLI FYSFNKGGDMNGFTGFTLEHYQTMFEDSRLMTILLQ 6 





9 


Sbjct: 


1 




69 


Sbj Ct : 




Query: 


129 


Sbjct: 




Query: 


189 


Sbj ct : 


181 


Query: 


249 


Sbjct: 


241 



GII+G+FMAFTYSLDDFAVTFF+TGN +TLSVEIY3RAR+GISL+INALSTIVF FSI 
PGIIAGYFMAFTYSLDDFAVTFFLTGNSVITLSVEIY3RARQGISLDINALSTIVFFFSI 240 

LIiVIGYYYISKEKGEKNA 266 
LLVIGYYY+S++K EK+A 
LIiVI GYYYMSQDKEEKHA 258 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1709 

A DNA sequence (GBSxl813) was identified in S.agalactiae <SEQ ID 5313> which encodes the amino 
acid sequence <SEQ ID 5314>. This protein is predicted to be spermidine/putrescine ABC transporter, 
permease protein (potB). Analysis of this protein sequence reveals the following: 

Possible site: 35 
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» Seems to have a cleavable N-tei 
INTEGRAL Likelihood = -9.55 
INTEGRAL Likelihood = -3.93 
INTEGRAL Likelihood = -3-. 35 
INTEGRAL Likelihood = -1.97 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



n signal seq. 

Transmembrane 250 -.266 ( 244 - 

Transmembrane 148 - 164 ( 146 - 

Transmembrane 65 - 81 ( 64 - 

Transmembrane 96 - 112 ( 96 - 



- Certainty=0. 4821 (Affirmative) 
■ Certainty=0. 0000 (Not Clear) < 
• Certainty=0. 0000 (Not Clear) < 



A related GBS nucleic acid sequence <SEQ ID 9853> which encodes amino acid sequence <SEQ ID 9854> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22990 GB:U32813 spermidine/putrescine ABC transporter, 
permease protein (potB) [Haemophilus influenzae Rd] 
Identities = 90/255 (35%) , Positives = 153/255 (59%) , Gaps = 11/255 (4%) 





21 


Sbjct: 


18 


Query: 


75 


Sbjct: 


77 


Query: 


133 


Sbjct: 


137 




191 


Sbjct: 


197 




251 


Sbjct: 


257 



T+ NY P+ Y ++ +NS+ +GI 



[ LLI YP A++++K+ K++ L L LV+LP II B I 



+VI PL++ G+ AG 



L+ A+ DLGAN +Q F 



L+GG +V+ +G 1+ 



ILILVMVAIMWL 265 
L ++M ++++ 
3LTVLMALL I FV 271 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5315> which encodes the amino acid 
sequence <SEQ ID 5316>. Analysis of this protein sequence reveals the following: 



INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 



Transmembrane 19 - 

Transmembrane 250 - 

Transmembrane 65 - 

Transmembrane 96 - 



91 

■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



148 - 165; 



- Certainty=0. 3951 (Affirmative) < succ 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAC22990 GB:U32813 spermidine/putrescine ABC transporter, 
permease protein (potB) [Haemophilus influenzae Rd] 
Identities = 91/262 (34%) , Positives = 158/262 (59%) , Gaps = 11/262 (4%) 

Query: 20 FLWILFFWAPVTLLFYKSFFDIEGR VTLANYETFFSSWTYLRMSVNS ILYAGI 73 

F W++FFV+ P L+ SF +G +T+ NY F+ Y ++ NS+ +GI 

Sbjct: 17 FSWLIFFVLIPNLLVIiAVSFLTRIXSSNFYAFPITIENYTNLFNP-LYAQvvWNSLSMSGI 75 
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Sbjct: 


74 

76 


5 




132 




Sbjct: 


136 


10 




190 




Sbjct: 


196 






250 


15 


Sbjct: 


256 




An alignment o 






20 


Sbjct: 




25 




64 




Sbjct: 


64 






124 


30 


Sb j ct : 


124 




Query: 


184 


35 


Sbjct: 


184 






244 




Sbjct: 


244 


40 


Based on 


this ! 



F +VI PL++ G+ AG V +P++ +F + L+GG +V+ +G 1+ FL ++NW 



3 = 215/266 (80%) , Positives = 239/266 (89%) 

RHJEMKKTSSLFSIPYMA.WLFLFVLAPVALIAKKSFFDINGHFTLAMYQTFFSSGTVLKM 63 
RR MKKTSSLFSIPY W+ FV+APV L+ + SFFDI G TIANY4TFFSS TYL+M 
RRSVMKKTSSLFSIPYFLWILFFWAPVTLLFYKSFFDIEGRVTLANYETFFSSWTYLRM 63 

SFNSVLYAGIVSFITLLISYPAAYLLTKLKHKQLWLMLVILPTWINLLLKAYAFMGIFGQ 123 
S NS+LYAGI++ +TLLISYP A LT+LKHKQLWLML+ILPTW+NLLLKAYAFMGIFGQ 
SVNSILYAGIITLVTLLISYPTALFLTRLKHKQLWLMLIILPTWVNLLLKAYAFMGIFGQ 123 

QGGINAFLTFIGIGPKQILFTDFSFLFVAAYIELPFMLLPIFNALDDIDQNLIYASDDLG 183 
QGGIN+FLTF+GIGP+QILFTDFSF+FVA+YIELPFM+LPIFNALDDID N+l AS DLG 
QGGINSFLTFMGIGPQQILFTDFSFIFVASYIELPFMMLPIFNALDDIDHNVINASRDLG 183 

ANAWQTFQKVIFPLSLNGVRAGVQSVFIPSLSLFMLTRLIGGNRVITLGTAIEQHFLITQ 243 
A+ +Q F KVIFPLSLNGVRAGVQSVFIPSLSLFMLTRLIGGNRVITLGTAIEQHFL TQ 
ASEFQAFSKVIFPLSLNGVRAGVQSVFIPSLSLFMLTRLIGGNRVITLGTAIEQHFLTTQ 243 

NKGMGSTIGVILILVMVAIMWLTKER 269 
N GMGSTIGV+LIL MVAIMWLTKE+ 
NWGMGSTIGWLILTMVAIMWLTKEK 2S9 

analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1710 

A DNA sequence (GBSxl814) was identified in S.agalactiae <SEQ ID 5317> which encodes the amino 
acid sequence <SEQ ID 5318>. This protein is predicted to be spennidine/putrescine ABC transporter, 
ATP-binding protein (potA). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3031 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB91525 GB:AE001165 spermidine/putrescine ABC transporter, 
ATP-binding protein (potA) [Borrelia burgdorferi] 
Identities = 166/345 (48%), Positives = 240/345 (69%), Gaps = 1/345 (0%) 
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Sbjct: 1 MDNCILEIKTO^SHYYDNNGNKTLDNINLKIKKNEFITLIjGPSGCGKTTLIKILGGFLSQK 60 

Query: 61 TGDIYLDGKRIlTOVPTNKRDWTVFQNYMiFPHMVFFJIVAFPLKLKKMDKKEIQKRVQE 120 

G+IY K 1+ NKR+++TVFQNYAt,FPHM VF+N++F L++KK K I+++V+ 
Sbjct: 61 NGEI YFFSKE I SKTS PNKRE INTVFQNYALFPHMNVFDHI S FGLRMKKTPKDI I KEKVKT 120 

Query: 121 TLKMVRLEGFEKRAI QKLSGGQRQRVAEARAI INQPKWLLDE PLSALDLKLRTEMQYEL 180 

+L ++ + + R I +LSGGQ4QRVAIARA++ +PK++LLDEPLSALDLK+R EMQ EL 
Sbjct: 121 SLSLIGMPKYAYRNINELSGGQKQRVAIARAMVMEPKLLLLDEPLSALDLKMRQEMQKEL 180 

Query: 181 RELQQRLGITFVFVTHDQEEALAMSDWIFVMNEGEIVQSGTPVDIYDEPINHFVATFIGE 240 

+++Q++LGITF++VTHDQEEAL MSD I VMNEG I+Q GTP +IY+EP FVA FIGE 
Sbjct: 181 KKIQRQLGITFIYVTHDQEEALTMSDRIVVMNEGIILQIGTPEEIYNEPKTKFVADFIGE 240 

Query: 241 SNILSGKMIEDYLVEFNGKRFEAVDGGMRPNESVQWIRPEDLQITLPDEGKLQVKVDTQ 3 00 
SNI G ++ +V G FE +D G E+V +VIRPED+++ +G L + + 

sbjct: 241 snifdgtykkelwsllghefecldkgfeaeeavdlv:rpedvkllpkgkghlsgtitsa 300 

Query: 301 LFRGVHYEIIAYDDLGNEWMIHSTRKAIEGEVIGLDFTPEDIHIM 345 

+F+GVHYE+ N W++ STR GE + 4 P+DIH+M 

Sbjct: 301 IFQGVHYEMTLEIQKTN-WIVQSTRLTKVGEEVDIFLEPDDIHVM 344 

There is also homology to SEQ ID 1292 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1711 

A DNA sequence (GBSxl815) was identified in S.agalactiae <SEQ ID 5319> which encodes the amino 
acid sequence <SEQ ID 5320>. Analysis of this protein sequence reveals the following: 
Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4990 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06283 GB:AP001515 UDP-N-acetylenolpyruvoylglucosamine 
reductase [Bacillus halodurans] 
= 119/286 (41%) , Positives = 166/286 (57%) , Gaps = 1/286 (0%) 



Identities 






Sbjct: 


15 


Query: 


73 


Sbjct: 


75 


Query: 


132 


Sbjct: 


135 


Query: 


192 


Sbjct: 


195 


Query: 


252 


Sbjct: 


255 



f VNG I AG +++ + L G EFA GIPGSVGGAV 



FMNAGA+G +1+ IL A VL P G L+ + M F YR S++Q++ I + A F+L 



QP +P+CGSVF+ P +AGQLI +A LKG +IGG 4 



AD LI HV +T++ +++E EV +IGE 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 532 1> which encodes the amino acid 
sequence <SEQ ID 5322>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4557 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 229/292 (78%) , Positives = 267/292 (91%) 

Query: 8 ELEGLDIRFDEPLKKYTYTKVGGPMYLAFPRNRLELSRIVKFANSQNIPVJMVLGNASNI 67 
15 EL G+DIR +EPLK YTYTKVGGPAD+LAFPRN ELSRIV +AN +N+PW+VLGNASN+ 

Sbjct: 4 ELHGIDIRENEPLKHYTYTKVaBPADFIAFPRNHYELSRIVAYANKENMPWLVLGNASNL 63 

Query: 68 IVRDGGIRGWIMFDKLSTVTVNGYVIEAFAGANLIETTRIARYHSLTGFEFACGIPGSV 127 
IVRDGGIRGFVIMFDKL+ V +NGY +EAEAGANLIETT+IA++HSLTGFEFACGIPGS+ 
20 Sbjct: 64 IVRDGGIRGFVIMFDKLNAVHLNGYTLEAEAGANLIETTKIAKFHSLTGFEFACGIPGSI 123 

Query: 128 GGAVFMNAGAYGGEIAHILLSAQVLTPQGELKTIEARNMQFGYRHSVIQESGDIVISAKF 187 

GGAVFMNAGAYGGEI+HI LSA+VLTP GE^KTI AR+M FGYRHS IQE+GDIVISAKF 
Sbjct: 124 GGAVFMNAGAYGGE I S H I F LS AKVLT PSGE I KT I SARDMA FG YRH SA I QETGD I VI S AKF 183 

25 

Query: 188 ALKPGDHLMITQEMDRLTYLRELKQPLEYPSCGSVFKRPPGHFAGQLISEAHLKGQRIGG 247 

ALKPG++ I+QEM+RL +LR+LKQPLE+PSCGSVFKRPPGHFAGQLI EA+LKG RIGG 
Sbjct: 184 ALKPGOTDTISQEMNRLNHLRQLKQPIiEFPSCGSVFKRPPGHFAGQLIMEANLKGHRIGG 243 

30 Query: 248 VEVSQKHAGFMVNIAEGSAQDYENLIEHVINTVESTSGVHLEPEVRIIGESL 299 

VEVS+KH GFM+N+A+G+A+DYE+LI +VI TVE+ SGV LEPEVRI IGE+L 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1712 

A DNA sequence (GBSxl816) was identified in S.agalactiae <SEQ ID 5323> which encodes the amino 
acid sequence <SEQ ID 5324>. This protein is predicted to be 2-amino-4-hydroxy-6- 
hydroxymethyldihydropterin pyrophosphokinasc/dihyd. Analysis of this protein sequence reveals the 
40 following: 

Possible site: 47 

>>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 1122 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:BAB03814 GB:AP001507 

2 - amino - 4 -hydroxy- 6 - hydroxymethyldihydr opt er idin e 
pyrophosphokinase [Bacillus halodurans] 
Identities = 64/146 (43%) , Positives = 94/146 (63%) 

55 Query: 5 YLSLGSNIGDRETFLKQALFSIDHLQICTIWAQISAIYETAAWGNTNQEDFFNICCQVETD 64 

Y++LGSNIGDR FL++A+ + K V S+IYET G T+Q F N+ +V T 
Sbjct: 6 YIALGSNIGDRSRFLEEAIQQIJVEHDKvTVTCCSSIYETDPVGYTDQSPFLNMVVEVSTS 65 
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Query: 65 LAPFELLDYCQEIEKCLKRVRHEHWGPRTIDIDILLFGNQVINQEDLWPHPYMTKRAFV 124 

L +LL+ Q+IE+ R RH WGPRT+D+DILL+ + E+L++PHP M +RAFV 
Sbjct: 66 LPVEQLLEVTQKIERYCGRERHIRWGPRTIiDLDILLYDQENREMENLIIPHPRMWERAFV 125 

Query: 125 LVPLLEIAPQLSLPNGSKLEDYLEKL 150 

L+PL+E+ P + P+G +E + +L 
Sbjct: 126 LI PLMELNPS IVAPSGKTIEQWREL 151 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5325> which encodes the amino acid 
sequence <SEQ ID 5326>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0479 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 85/156 (54%), Positives = 111/156 (70%), Gaps = 1/156 (0%) 

Query: 1 MTTWLSLGSNIGDRETFLKQALFSIDHLQXTKVAQISAIYETAAWGNTNQEDFFKICCQ 60 

MT VYLSLG+N+GDR +L++AL ++ L +T++ S+IYET AWG T Q DF N+ CQ 
Sbjct: 1 MTIVYLSLGTWGDRAAYLQKALEALADLPQTRLLAQSSIYETTAWGKTGQADFLNI'IACQ 60 

Query: 61 VETDLAPFELLDYCQEIEKCLKRVRHEHWGPRT1DIDILLFGNQVINQEDLWPHPYMTK 120 

++T L + L Q IE+ L RVRHE WG RTIDIDILLFG +V + ++L VPHPYMT+ 
Sbjct: 61 LDTQLTAADFLKETQAIEQSLGRVRHEKWGSRTIDIDILLFGEEVYDTKELKVPHPYMTE 120 

Query: 121 RAFVLVPDLEIAPQLSLPNGSK-LEDYLEKLNLGEV 155 

RAFVL+PDLE+ P L LP K L DYL L+ ++ 
Sbjct: 121 RAFVLI PLLELQPDLKLPPNHKFLRDYLAALDQSDI 156 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1713 

A DNA sequence (GBSxl817) was identified in S.agalactiae <SEQ ID 5327> which encodes the amino 
acid sequence <SEQ ID 5328>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2826 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5329> which encodes the amino acid 
sequence <SEQ ID 5330>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 547 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 75/119 (63%) , Positives = 92/119 (77%) 

Query: 1 ^KIYLNKCRFYGYHGAFSEEQTLGQVJXJVDAVIiSLDLftKASQTDDLIDTVHYGEVFDCI 60 

MDKI L CRFYGYHGAF EEQTLGQ+F VD LS+DL AS +D L DTVHYG VFD + 
Sbjct: 1 MKIVLEGCRFYGYHGAFKEEQTLGQIFLVDLELSVDLQAASLSDQLTDTVHYGMVFDSV 60 

+ VE E++ LIE+LAG I E +F +F P + +AI + I K+NPPI GHY+ +VGIELER+R 
Sbjct: 61 RQLVEGEKFILIERIAGAICEQLFNEFPPIEAIKVAIKKENPPIAGHYKAVGIELERQR 119 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1714 

A DNA sequence (GBSxl818) was identified in S.agalaciiae <SEQ ID 5331> which encodes the amino 
acid sequence <SEQ ID 5332>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 



Final Results 

20 . bacterial membrane — Certainty=0. 0000 (Hot Clear) < suco 

bacterial outside , Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 533 3> which encodes the amino acid 

25 sequence <SEQ ID 5334>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
»> Seems to have an uncleavable N-term signal seq 

Final Results 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 181/267 (67%), Positives = 224/267 (83%), Gaps = 1/267 (0%) 

Query: 1 MKIGQYDITGKACIMGILNVTPDSFSDGGSYTTIDSAIJNIQVGEMLEQGVAIVDIGGESTR 50 

MKIG++ I G A IMGILNVTPDSFSDGGSYTT+ AL+ V +M+ G I+D+GGESTR 
Sbjct: 1 MKIGKFVIEGNAAIMGILNVTPDSFSDGGSYTTVQIGUliDHVEQMliyDGAKIIDVGGESTR 60 

40 

Query: 61 PGAVFVTAEEEIKRWPMIKAIRSVYFDLLLSIDTYKTEVAQAALDAGVHILNDVWSGLY 120 

EG FV+A +EI RWP+IKAI+E Y D+L+SIDTYKTE A+AAL+AG IUSIDVW+GLY 
Sbjct: 61 EGCQFVSATDEIDRWPVIKAIKENY-DILISIDTYKTETARAALEAa^DILNDVWAGLY 119 

45 Query: 121 DGKMLSLAAERNVP 1 1 LMHNQEEAVYQD1 KKEVCEFLLERAERALEAGVSKDNIWIDPGF 180 

DG+M +LAAE + PIILMHNQ+E VYQ++ ++VC+FL RA+ AL+AGV K+NIW+DPGF 
Sbjct: 120 DGQMFAIAAEYDAPIILMHNQDEEVYQEVTQDVCDFLGNRAQAALDAGVPKNNIWVDPGF 179 

Query: 181 GFAKTEEQI^ELLKGLEQVCDIX3YPvI,FGISRKRTVNYLLGGNREVTERDMGTAAIjSAWA 240 
50 GFAK+ +QN ELLKGL++VC LGYPVLFGI SRKR V+ LLGGN + ERD TAALSA+A 

Sbjct: 180 GFAKSVQQNTELLKGLDRVCQLGYPVLFGISRKRVVDALLGGNTKAKERDGATAALSAYA 239 

Query: 241 IAKGCQIVRVHNVEVNKDIVTVISQLV 267 
+ KGCQIVRVH+V+ N+DIV V+SQL+ 
55 Sbjct: 240 LGKGCQIVRVHDVKANQDIVAVLSQLM 266 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1715 

A DNA sequence (GBSxl819) was identified in S.agalactiae <SEQ ID 5335> which encodes the amino 
acid sequence <SEQ ID 5336>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 2429 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <: suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5337> which encodes the amino acid 
sequence <SEQ ID 5338>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1590 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 151/184 (82%) , Positives = 166/184 (90%) 

Query: 3 NQEKMEKAIYQFLEALGENPNREGLKDTPKRVAKMyiEMPSGLNQDPKEQPTAVFSENHE 62 

N+EK E AIYQFLEA+GENPNREGL DTPKRVAKMY EMF GL +DPKB+FTAVF E HE 
Sbjct: 16 NKEKAEAAIYQFLEAIGENPNREGLLDTPKRVAKMYAEMFLGLGKDPKEEFTAVFKEQHE 75 

Query: 63 EWIVKDIPFYSMCEHHLVPFYGKAHIAYLPNDGRVTGLSKLARAVEVASKRPQLQERLT 122 

+WIVKDI FYS+CEHHDVPFYGKAHIAYDP+DGRVTGLSKLARAVEVASKRPQLQERLT 
Sbjct: 76 DWIVKBISFYSICEHHDVPFYGKAHIAYLPSDGRVTGLSKLARAVEVASKRPQLQERLT 135 

Query: 123 AQVAQALEDAIiAPKGIFVMlEAEHMCMTMRGIKKPGSKTITTVARGLYKDDRYERQEILS 182 

+Q+A AL +AL PKG VM+EAEHMCKTMRGIKKPGSKTITT ARGLYK+ R ERQE++S 
Sbjct: 136 SQIADALVEALNPKGTLVNVEAEHMCMTMRGI KKPGSKTITTTARGLYKESRAERQEVI S 195 

Query: 183 LIQK 186 
L+ K 

Sbjct: 196 LMTK 199 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1716 

A DNA sequence (GBSxl820) was identified in S.agalactiae <SEQ ID 5339> which encodes the amino 
acid sequence <SEQ ID 5340>. This protein is predicted to be folylpolyglutamate synthase (folC). Analysis 
of this protein sequence reveals the following: 

N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .2836 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 A related GBS nucleic acid sequence <SEQ ID 9855> which encodes amino acid sequence <SEQ ID 9856> 
was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14768 GB:Z99118 f olyl-polyglutamate synthetase [Bacillus subtilis] 
Identities = 154/42S (36%) , Positives = 245/426 (57%) , Gaps = 17/426 (3%) 

Query: 3 YQEaLEWIHSKLAFGIKPGLERMRWMLEQLGNPQNNLSAIHWGTNGKGSTTSYLQHIFT 62 



Query: 63 NSGYQVGTFTSPYIVDFRERISIDGQMIPESDFIKLVETWPWERLHLETNLEPATEFE 122 

+GY VGTFTSPYI+ F ERIS++G I + ++ LV ++P VE L +T TEFE 
Sbjct: 65 EAGYTVGTFTSPYI I TFNERI SVNG I P I SDEEVJTALVNQMKPHVEALD - QTEYGQPTEFE 123 

Query: 123 VITVLMFYYFGNSCP VDI VI IEAGMGGYYDSTHMFKALAVTCPS IGLDHQE VLGRTYVDI 182 

++T F YF VD VI E G+GG +DSTN+ + L SIG DH +LG T +1 

Sbjct: 124 IMTACAFLYF7AEFHKVDFVIFETGLGGRFDSTNWEPLLTVITSIGHDHMNILGNTIEEI 183 

Query: 183 AEQKVGVLKKGVPFVYANDRQDVEEVFQIICAKETHSQTYRLHNDFYIKEEE NYFN 237 

A +K G++K+G+P V A + + +V + +A+ + LH+ I EE F+ 
Sbjct: 184 AGEKAGI I KEGI PI VTAVTQPEALQVIRHEAERHAAPFQSLHDACVI FNEEALPAGEQFS 243 

Query: 238 YIGPQANIDHIQLQMPGHHQVSNASIAI-TTSLLnRDKYPKLTLQTIKDGLEMTKWVGRT 296 

+ + + 1+ + G HQ NA+++I L ++ ++ + ++ GL W GR 

Sbjct: 244 FKTEEKCYEDIRTSLIGTHQRQNAALSILAAEW1NKENIAHISDEALRSGLVKAAWPGRL 303 

Query: 297 ELI - -FPNVMIDGAHNNESVDALVQVIK-KYQQKWVHILFAAINTKPIESMLESLSSIA- 352 

EL+ P V 4DGAHN E V+ L + +K ++ + ++F+A+ KP ++M++ L +IA 
Sbjct: 3 04 ELVQEHPPVYLDGAHNEEGVEKLAETMKQRFANSRI S WFSALKDKPYQNM I KRLET IAH 363 

Query: 353 PVSVTSFDYPK- SINLDKYPKAYTRVSDWKKWLHDI NLTSDKDFYVITGSLYFIS 406 

+ SFD+P+ S+DY+ W+D+ + + +ITGSLYFIS 

Sbjct: 364 AIHFASFDFPRASLAKDLYDASEISNKSWSEDPDDVIKFIESKKGSNEIVLITGSLYFIS 423 

Query: 407 QVRQEL 412 
+R+ L 

Sbjct:, 424 DIRKRL 429 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5341> which encodes the amino acid 
sequence <SEQ ID 5342>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-terra signal seq 

INTEGRAL Likelihood = -1.28 Transmembrane 12 - 28 ( 12 - 28) 



Final Results 

bacterial membrane Certainty=0. 1510 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 230/411 (55%), Positives = 295/411 (70%), Gaps = 1/411 (0%) 

Query: 1 MTYQEALEWIHSKLAFGIKPGLERMRWNLEQLGNPQNNLSAIHWGTNGKGSTTSYLQHI 60 

MTY+E LEWIH L FGIKPGL+RM W+L QLGNPQ N+ +H+VGTNGKGST ++LQHI 
Sbjct: 34 MTYEETLEWIHDHLVFGIKPGLKR^WVIX^I/3NPQKNVKGWIVGTNGKGSTVNHLQHI 93 

Query: 61 FTNSGYQVGT FTS P Y I VD FRER ISIDGQMIPSSDFI KLVETVRPWERLHLETISILE PATE 120 

FT +GY+VGTFTS PYI +DF+ERIS I 4-G+MI E D + +RP+ ERL ET+ TE 

Sbj Ct : 94 FTTAGYEVGTFTSPYIMDFKERISINGRMISEKDLVIAANRIRPLTERLVQETDFGEVTE 153 

Query: 121 FEVITVLMFYYFGNSCPVDIVIIFAGMGGYYDSTNMFKALAVTCPSIGLDHQEVLGRTYV 180 

FEVIT++MF YFG+ PVDI IIEAG+GG YDSTN+F+A+ V CPSIGLDHQ +LG TY 
Sbjct: 154 FEVITLIMFLYFGDMHPVDIAI IK^.GLGGLYDSTKVFQAMVWCPS IGLDHQAILGETYA 213 



Query: 181 DIAEQKVGVLKKGVPFWANDRQDVEEVFQIKAISTHSQTYRLHNDFYIKEEENYFNYIG 240 
+IA QK GVL+ G V+A + EVF KA++ + + F + E+ + + 
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Sbjct: 214 NIAAQKAGVLEGGETLVFAVENPSAREVFLTKAEC'VGASIKEWQEQFQMAENASGYRFTS 273 

Query: 241 PQANIDHIQLQMPGHHQVSNASlAITTSLLIiRDKYPKLTLQTIKDGLEMTKWVGRTELIF 300 

P 11+ MPGHHQVSNA++AI T L L+D+YP+LT I++GL + W+GRTEL+ 
Sbjct: 274 PLGVISDIHIAMPGHHQVSNAALAIMTCLTLQDRYPRLTPDHIREGLANSLWLGRTELLA 333 

Query: 301 PRWIDGAHNIffiSVDALVQVIK-KYQQKNVHILFAAINTKPIESMLESLSSIAPVSVTSF 359 

PN+MIDGAHNNESV ALV V+K Y K +HILF AI+TKPI ML +L I + VTSF 
Sbjct: 334 PNLMIDGATO^SVAALVAVLKMNY^KKLHIIjFGAIDTKPIADMLVALEQIGDLQVTSF 393 

Query: 360 DYPKSINLDKYPKAYTRVSDWKKWLHDINLTSDKDFYVITGSLYFISQVRQ 410 

YP + L+KYP+ + RV+D+K +L DF+VITGSLYFIS++RQ 
Sbjct: 394 HYPNAYPLEKYPERFGRVADFKDFLALRKHAKADDFFVITGSLYFISEIRQ 444 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1717 

A DNA sequence (GBSxl821) was identified in S.agalactiae <SEQ ID 5343> which encodes the amino 
acid sequence <SEQ ID 5344>. This protein is predicted to be rarD. Analysis of this protein sequence 
20 reveals the following: 



Possible site: 38 

>>> Seems to have a cleavable N- 

INTEGRAL Likelihood =-12 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -7 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -2 

INTEGRAL Likelihood = -0 

INTEGRAL Likelihood = -0.32 



■ Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Transmembrane 130 ■ 
Transmembrane 269 - 
Transmembrane 212 
Transmembrane 80 ■ 
Transmembrane 106 ■ 
182 ■ 



146 ( 125 - 151 

285 ( 262 - 291 

228 ( 207 - 233 

95 ( 75 - 99 

122 ( 104 - 125 

198 ( 180 - 204 

56 ( 



Transmembrane - \ « - -j<, 

Transmembrane 153 - 169 ( 152 - 169: 
Transmembrane 251 - 267 ( 250 



— Certainty=0. 5925 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < : 

— Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07585 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
40 Identities = 109/288 (37%) , Positives = 185/288 (63%) , Gaps = 6/288 (2%) 

GIILGLSAYVLWGLLSLYWKLLSGIEAYSTFAYRIIFTVLTMLIYMLVSGRKTVYLKDLK 66 
G+I +SAY++WG L LYWKL+ + A A+RI++++ M+I + V + ++++ 
GVIAAISAYLIWGFLPLYWKLVDEVPASEMIAHRIWSLGFIWILLAVMKKNRQVMREIL 67 





7 


Sbjct: 


8 




67 


Sbj Ct : 


68 




127 


Sb j ct : 


128 


Query: 


187 


Sbjct: 


188 


Query: 


242 


Sbjct: 


247 



- VA+ILIS+NW + 



EASLGYY+ P+I++LL++- 



LA+SFG YGL+KK +SLS+ S+ +E+ 



L+ SG TA+PLLLFA KR ++I 



GF+QY+ PTI L+L +F+F+E 



I- F+ IW +++F+I H 
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No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8885> and protein <SEQ ID 8886> were also identified, 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 3 
McG: Disorim Score: 5.30 
GvH : Signal Score (-7.5): -1.64 

Possible site: 38 
»> Seems to have a cleavable N-term signal seq. 

9 value: -12.31 threshold: 



ALOM program 
INTEGRAL 
INTEGRAL 



modified ALOM 



Likelihood =-j 
Likelihood ~-l 
Likelihood = - 
Likelihood = - 
Likelihood = ■ 
Likelihood = - 
Likelihood = ■ 
Likelihood = • 
Likelihood = - 
Likelihood = 
score : 2.96 



Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 251 - 267 



' Reasoning Step: 3 



■ Final Results 

bacterial membrane 
bacterial outside - 



bacterial cytoplasm Certainty=0 . C 



- Certainty=0. 5925 (Affirmative) 

- Certainty=0. 0000 (Not Clear) • 



The protein has homology with the following sequences in the databases: 

ORF02052(319 - 1152 of 1485) 

GP|9S5460l|gb|AAF93371.l| |AE004110(13 - 289 of 302) rarD protein {vibrio cholerae} 
%Match =20.4 

%Identity = 37.7 %Similarity =66.3 

Matches = 104 Mismatches = 89 Conservative Sub . s = 79 



117 



147 



177 



207 



237 



267 



297 



327 



KDIWLW*R1^K**NKSALKIWRMLLICLEQDRR*WCVRKKKNKQLSQ 

11 = 

MFMTPDQQDAKKGIL 



LGLSAYVLWGLLSLYWKLLSGIEAYSTFAYRI I - - FTVLTML I YMLVSGRKTVYLKDLKGLVNNKKS FWTMFVAS ILI S I 
I :|h =1 = 1 I = I ===!== I =1 =11 = = I I 1 = 1= = II = = l = = 1 = 

LAI SAYTMWGI API YFKALGAVSALE ILSHRWWSFVLLA VL I HLGRRWRS W GWHTPRKFWLLLVTALLVGG 



1SWLVYI FAOTHGHATEASLGYYMPI I S ILLS\nL VXREK 

|||::|::: ■ I =111111= l====ll =1111=: =1= =1 =1111 = I 1 = = = = II 1:1 
NWLIFIWSINA]^IHMLDASLGYYINPLLNVLLGMLFLGERLRKLQWFAV7AIAAIGVGIQLVVFGSVPIVAIALATSFGFYG 



100 



110 



120 



130 



140 



150 



160 



831 861 891 921 942 972 1002 1032 

LLKKSISLSSDFSMLVESSFIAPFALIYIVFFAKDFLTDY--NILQL-VLLSLSGIITAVPLLLFAEAIKRAPLNIIGFI 
|| = | I = = = = = 1= 1= I I '-I I II --II :|s»| -.III MM: =11 

LLRKXIQVDAQTGLFLETLFMLPAAAIYLI^^IiADTFTSD^lALl™QImJLLVCAGVVTTLPLLCFTGAAARLKLSTLGFF 



180 



190 



210 



220 



230 



1062 1092 1122 1152 1182 1212 

QYINPTIQLLLALFIFKETIVSGEVIGFIFIWLAILVFSIGQVHTMLKKGK" 

111 l = = =111 = = = = 1 I = M III l = = = ll = 
QYIGPSLMFLLAVLWGFAFTSDKAITFAFIWSALVIFSVTX3LKAGEIAARRAR 



260 



270 



280 



290 



300 
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Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1718 

A DNA sequence (GBSxl822) was identified in S.agalactiae <SEQ ID 5345> which encodes the amino 
acid sequence <SEQ ID 5346>. Analysis of this protein sequence reveals the following: 

minal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 5200 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1719 

20 A DNA sequence (GBSxl823) was identified in S.agalactiae <SEQ ID 5347> which encodes the amino 
acid sequence <SEQ ID 5348>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



25 ■ Final Results 

bacterial cytoplasm Certainty=0 . 0881 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MRIIVPATSiMIIGPGFDSIGVALSICYLIIEVLEESTEWLVEHNLvN-IPKDHTNLLIQTA 59 

M+IIVPATSANIGPGFDS+GVA++KYL IEV EE EWL+EH + IP D NLL+ A 
Sbjct: 1 MKIIVPATSANIGPGFDSVGVAVTKYLQIEVS3ERDEKLIEHQIGKWIPHDERNLLLTIA 60 

Query: 60 LHVKSDLAPHRLKMFSDIPLARGLGSSSSVIVAGIELANQLGNLALSQKEKLEIATRLEG 119 

L + DL P RLKM SD+PLARGLGSSSSVIVAGIELANQLG L LS EKL++AT++EG 
Sbjct: 61 LQIVPDLQPRRLKMTSDVPLARGLGSSSSVIVAGIELANQLGQLNLSDHEKLQLATKIEG 120 

Query: 120 HPDNVAPAIFGDLVISSIVKNDIKSLEVMFPDSSFIAFIPNYELKTSDSRNVIiPQKLSYE 179 

HPDWAPAI +G4LVI+ S V+ + ++ FP+ F+A+IPNYEL+T DSR+VLP+KLSY+ 
Sbjct: 121 HPDNVAPAIYGNLVIASSVEGQVSAIVADFPECDFLAYIPNYELRTRDSRSVLPKKLSYK 180 

Query: 180 DAVASSSVANVWASLLKGDLVTAGWAIERDLFHERYRQPLVKEFEVIKQISTQNGAYAT 239 

+AVA+SS+ANV VA+LL GD+VTAG AIE DLFHERYRQ LV+EF +IKQ++ +NGAYAT 
Sbjct: 181 EAVAASSIANVAVAALLAGDMVTAGQAIEGDLFHERYRQDLVREFAMIKQVTKENGAYAT 240 

Query: 240 YLSGAGPTVMVl.CSKEKEQAIVTELSKLCLGGQIQVtNIERKGVRVEKR 288 

YLSGAGPTVMVL S +K I EL K G++ L ++ +GVRVE + 
Sbjct: 241 YLSGAGPTOWLASHDKMPTIKAELEKQPFKGKLHELRTOTQGVRVEAK 289 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1720 

A DNA sequence (GBSxl824) was identified in S.agalactiae <SEQ ID 5349> which encodes the amino 
acid sequence <SEQ ID 5350>. This protein is predicted to be homoserine dehydrogenase (hom). Analysis 
of this protein sequence reveals the following: 

3 N-terminal signal sequence 



le Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9857> which encodes amino acid sequence <SEQ ID 9858> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 15 MTIKIALLGFGWAKGIPYLLKENQHKLLSLEGEDIVIDKVLVRDNESRQRFINQGFTYN 74 

M + IA+LGFGTV G+P LL EN+ KL + E+IVI KVL+RDN++ ++ +QGF Y+ 
Sbjct: 1 MAWIAILGFGWGTGLPTLLSENKEKLAKILDEEIVISKVLMRDNKAIEKARSQGFNYD 60 

Query: 75 FVTEINTILQDSQIDIVVELMGGIEPAKTYLSQALGFGKHIVTANKDLIALHGKELMDLA 134 

FV ++ IL DS+I IWELMG IEPAKTY++QA+ GK++VTANKDL+A+HG EL LA 
Sbjct: 61 FATLNLDDILADSEISIVVEIJ^GRIEPAKTYITQAIEAGKNVVTANKDLLAVHGVELRSIjA 120 

Query: 135 DARGIiALFYEGAVAGGlPILRTLSHSFASDKMTRLLGILNGTSNFMLTKMFEEGWSYEQA 194 

+AL+YE AVAGGIPILRTL++SF+SDK+T LLGILNGTSNFM+TKM EEGW+Y+++ 
Sbjct: 121 QKHHVALYYEAAVAGGIPILRTLANSFSSDKITHLLGILNGTSNFMMTKMSEEGWTYDES 180 

Query: 195 LKKAQELGYAESDPTNDVEGIDTAYKATILSQFGFGMPIDFDDVNYKGISSIRSEDVEVA 254 

L KAQELGYAESDPTNDV+GID +YK ILS+F FGM + DD+ G+ SI+ DVE+A 
Sbjct: 181 LAKAQELGYAESDPTNDVDGIDASYKLAILSEFAFGMTLAPDDIAKSGLRSIQKTDVEIA 240 

Query: 255 QEMGFAIKLVADLRETPTGISVDVSPTLISQKHPLAAVIvIHVMNAVFIESIGIGQSLFYGP 314 

Q+ G+ +KL ++ E +GI +VSPT + + HPLA+VN VMNAVFIES GIG S+FYG 
Sbjct: 241 QQFGYVLKLTGEINEVDSGIFAEVSPTFLPKSHPLASVNGVMNAVFIESEGIGDSVFYGA 300 

Query: 315 GAGQNPTATSVLADI IDI SRSIRSQI XI KPMNTYHCPCRLSMQSDI FNEYYLAISLRNAE 374 

GAGQ PTATSVLADI+ I + ++ K NY L+ DI N+YY ++ E 
Sbjct: 301 GAGQKPTATSVLADIVRIVKRVKDGTIGKSFNEYARSTSIiANPHDIENKYYFSV E 355 

Query: 375 DSDTLGR YFEQENIGLKNVIEKALGDKQQEIYVLTDEVSQEKITQFIEEFPESG 428 

D+ G+ F EN+ + V+++ K+ + +++ ++++ +++ ++ + 

Sbjct: 356 TPDSTGQLLLLVELFTSENVSFEQVLQQKGNGKRAVWIISHKINRVQLSAIQDKLNQEK 415 



Sbjct: 416 DFKLLNRFKVLG 427 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or diagnostics. 
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Example 1721 

A DNA sequence (GBSxl825) was identified in S.agalactiae <SEQ ID 5351> which encodes the amino 
acid sequence <SEQ ID 5352>. Analysis of this protein sequence reveals the following: 

■> N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4548 (Affirmative) < euco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1722 

A DNA sequence (GBSxl826) was identified in S.agalactiae <SEQ ID 5353> which encodes the amino 
acid sequence <SEQ ID 5354>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.79 Transmembrane 20 - 36 ( 14 - 41) 

Final Results 

bacterial membrane Certainty=0 . 6116 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15906 GB:Z99123 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 105/272 (38%) Positives = 149/272 (54%), Gaps = 20/272 (7%) 

FLLIALIGIFLFFNNRSKQEIKT KTNASSHRKIVTSIKKKK WIKQKTPVK 74 

FL I L+G h + QE K K ++KK+ WIK + P K 

FLSIFLLGSCLALAACADQEANAEQPMPKAEQKKPEKKAVQVQKKEDDTSAWIKTEKPAK 64 



+K + ITFDDG D Y AYP+LKKY +KAT +1 + G + +L +QM EM Q+G+ 



Query: 


25 


Sb j ct : 


5 


Query: 


75 


Sbjct: 




Query: 


135 


Sbjct: 


118 


Query: 


19S 


Sbjct: 


176 




255 


Sbjct: 


236 



h HT+ H L+ LTP+ Q EM SK+ D Q T I+YP GRYN TL A 4 



Y++G+TT G A++D G+ +L+R+R+ P S 
Sbjct: 236 GYQMGVTTEPGAASRDQGMYALHRVRVSPGMS 267 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5355> which encodes the amino acid 
sequence <SEQ ID 5356>. Analysis of this protein sequence reveals the following: 

Possible site: 24 
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»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

=>GP:CAB15906 GB:Z99123 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 97/240 (40%) , Positives = 140/240 (57%) , Gaps = 9/240 (3%) 







Sbjct: 


37 




131 


Sbjct: 


90 






Sbjct: 


150 


Query: 


251 


Sbjct: 


208 



I- D GY L+P+E L+ ++ P++K V +TFDD D Y AYP+LKKY KAT 



+Q T I+YP GRYN+ TL+- A + Y4-+GVTT G AS G+ +L+R+R+ P MS 
4FHQQTTI I S YPVGRYNEETLKAAEKTGYQMGVTTEPGAftSRDQGMYALHRVRVS PGMS 267 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/265 (57%) , Positives = 199/265 (74%) , Gaps = 4/265 (1%) 

Query: 33 IFLFFNNRSKQEIKTK T^SSHRKIVTSIKKKKMIKQKTPVKIPILMYHAVHVMDPS 89 

I LF + ++ ++ TK T+ S + + K W KQ+TPVKI PILMYHA+HVM P 
Sbjct: 54 ISLFHHKKTAKKETTKLKKTHFDSSKSQKKAHSKLTWTKQBTPVKIPILMYHAIHVMSPE 113 

Query: 90 FjyiSANLIVAPDIFESHIKRLKKEGYYFLAPSEAYRAIjNENALPEKKVIWITFDDGNADF 149 

E A+ANLIV PD+F+ ++++K EGYYFL+P E YRAL+ N LP KKV+W+TFDD DF 
Sbjct: 114 ETANANLIWPDLFDQQLQKMKDEGYYFLSPEEVYRALSNNELPAKKWWLTFDDSMIDF 173 

Query: 150 YTKAYPILKKYKVKATNNIITGBVQEGRESMiWQQMLEMKQNGMSFQGHTOTHPNLSLL 209 

Y AYPILKKY KATNN+ITG + G +NL ++QM EMKQ GMSFQ HTV HP+L 
Sbjct: 174 YNVAYPILKKYDAKATNNVITGLTEMGSAANLTLKQMKEMKQVGMSFQDHTVNHPDLEQA 233 

Query: 210 TPELQTQEMTLSKQFLDQKLSQDTLAIAYPSGRYNPTTLDIASQY-YJCLGLTTNEGVATK 268 

+P++QT EM SK +LD++L+Q+T+A1AYPSGRYN TTL IA++ YKLG+TTNEG+A+ 
Sbjct: 234 SPDVQTTEMKDSKDYLDKQIjNQNTIAIAYPSGRYNDTTLQIAARLNYKLGVTTNEGIASA 293 

Query: 269 DNGLLSLNRIRILPTTSDDDLIKTI 293 

NGLLSLNRIRILP S ++L++T+ 
Sbjct: 294 ANGLLSLNRIRILPNMSPENLLQTM 318 

SEQ ID 5354 (GBS287d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 145 (lane 3 & 4; MW 57kDa) and in Figure 185 (lane 2; MW 57kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
145 (lane 6; MW 32kDa) and in Figure 181 (lane 5; MW 32kDa). 

Purified GBS287d-GST is shown in Figure 243, lanes 10-11; purified GBS287d-His is shown in Figure 234, 
lanes 7-8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1723 

A DNA sequence (GBSxl828) was identified in S.agalactiae <SEQ ID 5357> which encodes the amino 
acid sequence <SEQ ID 5358>. Analysis of tiiis protein sequence reveals the following: 

Possible site: 21 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane — Cer-ainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

10 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1724 

A DNA sequence (GBSxl829) was identified in S.agalactiae <SEQ ID 5359> which encodes the amino 

acid sequence <SEQ ID 5360>. Analysis of tins protein sequence reveals the following: 

Possible site: 40 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3352 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted mat this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 1725 

A DNA sequence (GBSxl830) was identified in S.agalactiae <SEQ ID 5361> which encodes the amino 
acid sequence <SEQ ID 5362>. This protein is predicted to be glycine betaine transporter BetL (opuD). 
Analysis of this protein sequence reveals the following: 

35 Possible site: 61 



40 



> Seems to have an uncleavable N 


term signal sec 










INTEGRAL 


Likelihood = 


12 


68 


Transmembrane 


439 


455 


435 


491) 




Likelihood = 


12 


10 


Transmembrane 


256 


272 


249 


281) 


INTEGRAL 


Likelihood = 


11 


30 


Transmembrane 


464 


480 


456 


491) 


INTEGRAL 


Likelihood = 


10 


83 


Transmembrane 


49 


65 


44 


74) 


INTEGRAL 


Likelihood = 


10 


40 


Transmembrane 


11 


27 


5 


34) 


INTEGRAL 


Likelihood = 


-9 


98 


Transmembrane 


396 


412 


390 


419) 


INTEGRAL 


Likelihood = 


-9 


29 




224 


240 


220 


247) 


INTEGRAL 


Likelihood = 


-7 


11 


Transmembrane 


347 


363 


341 


366) 


INTEGRAL 


Likelihood = 


-2 


87 


Transmembrane 


143 


159 


143 


159) 


INTEGRAL 


Likelihood = 


-2 


60 


Transmembrane 


192 


208 


191 


208) 


INTEGRAL 


Likelihood = 


-1 




Transmembrane 


86 


102 


86 


105) 



50 



Final Results .' 

bacterial membrane --- Certaxnty=0. 6074 (Affirmative) < suco 
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bacterial outside --- Certainty=0 . 0000 (Not Clear) 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) 



The protein has homology with the following sequences in the GENPEPT database. 

2 betaine transporter BetL [Listeria 



>GP:AAD30266 GB:AF102174 glycine 
monocytogenes] 
Identities = 277/503 (55%) , 1 



, Gaps = 1/503 (0%) 



Query: 4 KHITPVFTGSLIVSLILVLLGIIVPRGFQSWTQILREQVSTNFGWLYLLLVTSILALCVF 63 

K +T VF GS + L+ VL G +P F+++T +++ +++NFGW YL++V 1+ C+F 
Sbjct: 2 KKLTNVFWGSGFLVLLAVLFGAFLPEQFETFTNHIQKFLTSNFGWYYLIWAIIIIFCLF 61 

Query: 64 FIMSPLGQIRIX3QPHSRPEYSTVSWIAMMFSAGMGIGLVFYGAAEPLSHFAISTPGAPKE 123 

++SP+G 1RLG+P P YS SW AM+FSAGMGIGLVF+GAAEPLSH+A+ PG 
Sbjct: 62 LVLSPIGS1RLGKPGEEPGYSNKSWFAMLFSAGKGIGLVFWGAAEPLSHYAVQAPGGEVG 121 

Query: 124 SQTALADAFRFTFFHWGIHAWAVYALVALALAYFGFRKQEKYLLSVTLKPLFGDKTDGWL 183 

+Q A+ DA R++FFHWGI AW++YA+VALALAYF FRK L+S TL P+ G G + 
Sbjct: 122 TQAAMKDALRYSFFHWGISAWSIYAIVALALAYFKFRKNAPGLISATLYPILGKHAKGPI 181 

Query: 184 GKIVDITTWATVIGVATTLGFGAAQINGGLSFLLGVPNNAFVQIVIILITTALFVMSAL 243 

G+++DI V ATVIGVATTLG GA QINGGL++L GVPNN VQ II+I T LF++SA+ 
Sbjct: 182 GQLIDIIAVFATOIGVATTLGLGAQQINGGLTYLFGVPNNFTVQFTIIVIVTILFMLSAM 241 

Query: 244 SGLGKGVKILSNLNLILAVALLALVIVLGPTVRIFDTLTESLGSYLQNFFGMSFRAAAFD 303 

SGL KG+++LSN+N+ +A LL L ++LGPT+ I + T S G YLQN MSF+ A 
Sbjct: 242 SGLDKGIQLLSNVNIYVAGVLLVLTLILGPTLFIMNNFTNSFGDYLQNIIQMSFQTAPDA 3 01 

Query: 304 NTKRSWIDNWTIFYWAWWISWSPFVGVFIARISKGRSIREFLTWLLIPTLLSBWFAAF 363 

R WID+WTIFYWAWW+SWSPFVG+FIARIS+GR+IR+FL V+++P L+S WFA F 
Sbjct: 302 PDARKWIDSWTIFYWAWWLSWSPFVGIFIARISRGRTIRQFLLGVIVLPALVSVFWFAVF 361 

Query: 364 GTLSTQVQQLG-TNLTKFATEEVLFATFNHYTLGWLLSIIAIILIFSFFITSADSATYVL 422 

G + V+Q G + L+ ATE+VLF FN + G +LSI+A+ILI FFITSADSAT+VL 
Sbjct: 362 GGSAIFVEQHGNSGLSSLATEQVLFGVFNEFPGGMMLSIVAMILIAVFFITSADSATFVL 421 

Query: 423 AMLTEDGNIJ'IPKNRTKVIWGLVLAVIAIVLLLSGGLLALQNVLIIVALPFSFVMILMMLA 482 

M T G+LNP N KV WGL+ A IA VLL +GGL ALQN II A PFS V+ILM+++ 
Sbjct: 422 GMQTTGGSLNPPNSVKVTWGLLQAGIASVLLYAGGLTALQNASIIAAFPFSIVIILMIVS 481 

Query: 483 LLVELFHEKKEMGLS I S PDRYPR 505 

L V L E++++GL + P + R 
Sbjct: 482 LFVSLTREQEKLGLYVRPKKSQR 504 



45 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8887> and protein <SEQ ID 8888> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 15.28 
50 GvH: Signal Score (-7.5): -4.24 

Possible site: 61 

3 to have an uncleavable N-term signal seq 



ALOM program count 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-12. 
Likelihood =-12 
Likelihood =-11. 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



12.68 threshold: 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



249 - 2811 
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*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 6074 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02057(310 - 1821 of 2145) 

GP|4835822|gh|AAD30266.l|AF102174_l|AF102174(2 - 506 of 507) glycine betaine transporter 
BetL {Listeria monocytogenes} PIR|T48645 |T48645 glycine betaine transport protein betL 
[validated] - Listeria monocytogenes 
%Match =38.7 

^Identity =54.9 %Similarity =74.7 

Matches = 277 Mismatches = 127 Conservative Sub.s = 100 

54 84 114 144 174 204 234 264 

IQGGHHYR1WRLEVLKIQDMWS*ANLDLMPLSTNIWYLHQIVINH* 

294 324 354 384 414 444 474 504 

KVCYTILV*EEILSKKHITPVFTGSLIVSLILVLLGIIVPRGFQSWTQILREQVSTNFGWLYLLLVTSILALCVFFIMSP 
I =1 II II = I: 11 = 1 :| |:::| = = = ===1111 ll = = 1 1 = =l=l===ll 

MKKLTNVFWGSGFLVLLAVLFGAFLPEQFETFTNHIQKFLTSNFGWYYLI WAI III FCLFLVLSP 



LGQIRLGQPHSRPEYSTVSWIAMMFSAGMGIGLVFYGAAEPLSHFAI STPSAPKESQTALADAFRFTFFHWGIHAVJAVYA 
= 1 1111 = 1 I II II 11=11111111111=11111111=1= II «l ! = ll=l==llllll ||:<|| 

IGSIRLGKPGEEPGYSNKSWFAMLFSAGMGIGLVFWGAAEELSHYAVQAPGGEVGTQAAMKDALRYSFFHWGISAWSIYA 



LVALALAYFGFRKQEKYLLSVTLKPLFGDKTDGWLGKIVDITT\ r VATVIGVATTLGFGAAQINGGLSFLLGVPNNAFVQI 
= 11111111 III hi II l = = l I =l = = = ll I 11111111111=11 lllll|: = l = lllll II 
IVAIAIAYFKFPJ<NAPGLISATLYPILGKHAKGPIGQLIDIIAVFATVIGVATTLGLGAQQINGGLTYLFGVPNNFTVQF 
160 170 180 190 200 210 220 

1014 1044 1074 1104 1134 1164 1194 1224 

VIILITTALFVMSALSGLGKGVKILSNraLILAVALLALVIVLGPTVRIFDTLTESLGSYLQNFFGMSFRAAAFDNTKRS 

11 = 1 I ll = = ll = lll l|:::|||:|: =1 II I ==1111= I = =1 1 = 1 III 111= I I 
TIIVIVTILFMLSAMSGLDKGIQLLSNVNIYVAGVLLVLTLILGPTLFIMNNFTNSFGDYLQNIIQMSFQTAPDAPDARK 
240 250 260 270 280 290 300 

1254 1284 1314 1344 1374 1404 1431 1461 

WIDNWTIFYWAWWISWSPFVGVFIARISKGRSIREFLTWLLIPTLLSFVWAAFGTLSTQVQQLGTN-LTKFATEEVLF 

lll=MIIIIIII=lllllll=llllll=ll=ll=ll l===l 1=1 III II = 1=11= 1= =111=111 
WIDSWTIFYWAWWLSWSPFVGIFIARISRGRTIRQFLLGVIVLPALVSVFWFAVFGGSAIFVEQHGNSGLSSLATEQVLF 
320 330 340 350 360 370 380 

1491 1521 1551 1581 1611 1641 1671 1701 

ATFNHYTLGWLLSIIAIILIFSFFITSADSATYVliAMLTEDGNI^PKNRT 

ii = i =111 = 1 = 111 niiiinihii i i 1 = 111 i ii iii= i ii iii =in mi ii 

GVFNEFPGG^LSIVAMILIAVFFITSADSATFVLGMQTTGGSLNPPNSVKVTWGLLQAGIASVLLYAGGLTALQNASII 
400 410 420 430 440 450 460 

1731 1761 1791 1821 1851 1881 1911 1941 

VALPFSFVMILMMLALLVELFHEKKEMGLSISPDRYPRKNEPFKSYEE*KEARRLLFIG*SS*SDHHR**LVRYEFD*EK 

1=111 l=lll===l=l I l====ll =1=1 
AAFPFSIVIILMIVSLFVSLTREQEKLGLYVRPKKSQRSQL 
480 490 500 

Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1726 

A DNA sequence (GBSxl831) was identified in S.agalactiae <SEQ ID 5363> which encodes the amino 
acid sequence <SEQ ID 5364>. This protein is predicted to he succinic semialdehyde dehydrogenase 
(gabD-1). Analysis of mis protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2733 [Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9859> which encodes amino acid sequence <SEQ ID 9860> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD19405 GB:AF102543 succinic semialdehyde dehydrogenase 
[Zymomonas mobilis] 
Identities <= 229/455 (50%), Positives = 305/455 (66%), Gaps = 5/455 (1%) 

MAYKTIYPYTNEVLHEFDNI SDSDLEQSLDIAHALYKTWRKEDNVEERQNQLHKVADLLR S 9 
MAY+++ P T E + ++ + SD ++ S+D A ++K + + ER LHK A++ R 
MAYES WPATGETVKKYPDFSDKQVKDSVDPJUVTVFKMDWSQRTIAERSKVLHKAAEIFR 6 0 

KDRDKYAEVMTKDMGKLFTEAQGEVDLCADIADYYADNGQKFLKPVPLESPNGEAYYLKQ 129 

D DKYA+++T DMGK EA+GEV+L ADI DYYA NG+KFL P +E G A 
SDVDKYAKLLTIDMGKKIAEARGEVNIiSADILDYYAKNGEKFLAPQKVEEKPG-AVVKAF 119 



D D L+ K + RL+NAGQV ++KRFI++ KEF + + F++ K GDPMD 



T L PLSS GA+D V+KQ++ AV +GA++V G 1+ G F+ +LT+I + NP Y 



Query: 


10 


Sbjct: 


1 


Query: 


70 


Sbjct: 


61 


Query: 


130 


Sbjct: 








Sbjct: 


180 




250 


Sbjct: 


240 


Query: 


307 


Sbjct: 


299 


Query: 


357 


Sbjct: 


359 




427 


Sbjct: 


419 



+E FGP+A IY V E EAI LANDS YGLG VF+ D E +KVA QIETGM IN 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5365> which encodes the amino acid 
sequence <SEQ ID 5366>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2887 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 335/457 (73%) , Positives = 397/457 (86%) 

Query: 9 ItffiYKTIYPYTNEVLHEFDNISDSDLEQSLDIAHALYKTWRKEDNVEERQNQLHKVADIjIj 68 

4MAY+TIYPYTNEVLH FDN++D L L+ AH LYK WRKED++EER+ QLH+VA++L 
Sbjct: 1 VmYQTIYPYTI^VLHTFDNMTDQGIjyDVLERAHLLYKKWRKEDHLEERKAQLHQVANIL 60 

Query: 69 RKDRDKYAEVMTKDMGKLFTEAQGEVDLCADIADYYADNGQKFLKPVPLESPNGEAYYLK 128 

R+DRDKYAE+MTKDMGKLFTEAQGEV+liCADIADYYAD +FL PLE+ +G+AYYLK 
Sbjct: 61 RRDRDKYAEIMTKDMGKLFTEAQGEVNLCADIADYYADKADEFLMSTPLETDSGQAYYLK 120 

Query: 129 QAVGVLLAVEPWNFPFYQIMRVFAPNFIVGNTMLLKHASICPASAQAFEDLVREAGAPEG 188 

Q+ GV+LAVEPWNFP+YQIMRvFAPNFIVGN M+LKHASICP SAQ+FE+LV EAGA G 
Sbjct: 121 QSTGVILAVEPWNFPYYQIMRVFAPNFIVGKPMVLKHASICPRSAQSFEELVLEAGAEAG 180 

Query: 189 AFKNIFASYDQVSNLISDPRVAGVCLTGSERGGASIAAEAGKNLKKSSMELGGNDAFLIL 248 

+ N+F SYDQVS +I+D RV GVCLTGSERGGASIA EAGKNLKK+++ELGG+DAF+IL 
Sbjct: 181 S1TNLFISYDQVSQVIADKRWGVCLTGSERGGASIAEEAGKNLKKTTLELGGDDAFIIL 240 

Query: 249 DDADFDLLSKTIFFARLYNAGQVCTSSKRFIVMADKYDEFVM/IWETFKSAKWGDPMDSE 308 

DDAD+D L K ++F+RLYNAGQVCTSSKRFIV+ YD F ++ + FK+AKWGDPMD E 
Sbjct: 241 DDADITOQLEKVLYFSRLYNAGQVCTSSKRFIVLDKDYDRFKELLTKVFKTAKWGDPMDPE 300 

Query: 309 TTIAPIjSSAGAKDDVLKQIKLAVDHGAEWFGNDTIDHPGNFVMPTVLTNITKANPIYNQ 368 

TTLAPLSSA AK DVL QIKLA+DHGAE+V+G + IDHPG+FVMPT++ +TK NPIY Q 
Sbjct: 301 TTIAPLSSAQAIfADVLDQIKLALDHGAELVYGGEAIDHPGHFVMPTIIAGLTKDNPIYYQ 360 

Query: 369 EIFGPVASIYKVDTEEFAIALANDSSYGLGSTVFSSDPEHAKKVAAQIETGMTFINSGWT 428 

EIFGPV IYKV +EEEAI +ANDS+YGLG T+FSS+ EHAK VAA+IETGM+FINSGWT 
Sbjct: 361 EIFGPVGEIYKVSSEEEAIEVANDSNYGLGGTIFSSNQEHAKAVAAKIETGMBFINSGWT 420 

Query: 429 SLPELPFGGIKNSGYGRELSQLGFDAFVNEHLVFTPN 465 

SLPELPFGGIK+SGYGRELS+LGF +FVNEHL++ PN 
Sbjct: 421 SLPELPFGGIKHSGYGRELSELGFTSFVNEHLIYIPN 457 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1727 

A DNA sequence (GBSxl832) was identified in S.agalactiae <SEQ ID 5367> which encodes the amino 
acid sequence <SEQ ID 5368>. Analysis of this protein sequence reveals the following: 

Possible site: 31 , 
»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not • Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1728 

A DNA sequence (GBSxl833) was identified in S.agalactiae <SEQ ID 5369> which encodes the a 
acid sequence <SEQ ID 5370>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

i» Seems to have a cleavable N-term signal seq. 
Likelihood = - 
Likelihood = - 
Likelihood = ■ 
Likelihood = ■ 
INTEGRAL Likelihood = • 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 
INTEGRAL Likelihood = - 



INTEGRAL 



Transmembrane 286 - 



■-- Certainty=0. 4163 (Affirmative) . 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9861> which encodes amino acid sequence <SEQ ID 9862> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75219 GB:AE000305 orf, hypothetical protein [Escherichia coli K12] 
Identities = 102/331 (30%) , Positives = 172/331 (51%) , Gaps = 26/331 (7%) 



Query: 




Sbjct: 


17 




64 


Sbjct: 


75 


Query: 




Sbjct: 


135 


Query: 


181 


Sbjct: 


195 




239 


Sbjct: 


248 


Query: 


298 


Sbjct: 


305 



+Y+L+ ++L GF L Q+ VGIS + I ++T+S + ++A L QK+F LDK + 



+KI P F + FI+ ++ 



A+G+ T+VS L K G K 4 



50 A related DNA sequence was identified in S.pyogenes <SEQ ID 5371> which 
sequence <SEQ ID 5372>. Analysis of this protein sequence reveals the following: 



encodes the amino acid 



Possible site 
.> Seems to have i 
INTEGRAL 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



:leavable 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = -2 
Likelihood = -2 



57 



Transmembrane 



46 ( 



50 



Transmembrane 150 - 



330 ( 311 - 

24 ( 7 - 

166 ( 146 - 

273 ( 252 - 

107 ( 87 - 108) 

85 ( 68 - 

305 ( 289 - 
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Final Results 

bacterial menibrane --- Certainty=0. 4715 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC75219 GB:AE000305 orf, hypothetical protein [Escherichia 
coli] 

Identities = 100/329 (30%), Positives = 173/329 (52%), Gaps = 21/329 (6%) 

Query: 8 LPGLLLCLLLALPAWCLGRLFPIIGAP VFAILLGMLLA-LFYEHRDKTKEG-ISFT 61 

+PGL L ++ A G + + GA AILLGM+L Y H K+ +G + F 

Sbjct: 17 IPGLALSAVITGVALWGGSIPAVAGAGFSALTLAILLGtWLGNTIYPHIWKSCDGGVLFA 76 

Query: 62 SKYILQTAWLLGFGLNLTQVMAVGMQSLPI 1 1ST IATALLVAYGL -QKWLRLDVNTATL 120 

+Y+L+ ++L GF L +Q+ VG+ + I + T+++ L+A L QK LD +T+ L 
Sbjct: 77 KQYLLRLGIILYGFRLTFSQIADVGISGIIIDVLTLSSTFLLACFLGQKVFGLDKHTSWL 136 

Query: 121 VGVGSSICGGSAVAATAPVIKAKDDEVAKAISVIFLFNMIAALLFPSLGQLLG--LSNEG 178 

+G GSSICG +AV AT PV+KA+ +V A++ + +F +A L+P++ L+ S E 
Sbjct: 137 IGAGSSICGAAAVIATEPVVKAEASK^TVAVATWIFGTVAIFLYPAIYPLMSQWFSPET 196 

Query: 179 FAIFAGTAVNDTSSVTATATAWDALHHSNTTjDGATlVKLTRTLAILPITLGLSLYRAKKE 238 

F 1+ G+ V++ + V A A + + AIK+R + + P +L+ RK+ 

Sbjct: 197 FGIYIGSTVHEVAQWAAGHAIS PDAENAAVISKT/ILRVMMLAPFLILIAA-RVKQL 251 

Query: 239 HDIVTEENFSLRKSFPRFILFFLLASLITTLMTSLGVSADSFHYLKTLSKFFIVMAMAAI 298 

+ E ++PF + F+++++ + +LTL F + MAMAA+ 

Sbjct: 252 SGANSGEKSKI - -TIPWFAILFIWAIFNSFHL LPQSWNMLVTLDTFLLAMftMAAL 305 

Query: 299 GLNTNLVKL I KTGGQAI LLGAI - - CWVAI 325 

6L T++ L K G + +L+ + W+ + 
Sbjct: 307 GLTTHVSALKKAGAKPLLMALVLFAWLIV 335 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 225/333 (67%), Positives = 277/333 (82%), Gaps = 3/333 (0%) 

KIPGLILCFIIAIPSWLLGLYLPLIGAPVFAILIGIIVGSFYQNRQLFNKGIAFTSKYIL 70 
K+PGL+LC ++A+P+W LG P+IGAPVFAIL+G+++ FY++R +GI+FTSKYIL 
KLPGLLLCLLLALPAWCLGRLFPIIGAPVFAILLGMLLALFYEHRDKTKEGISFTSKYIL 66 

: 71 QTAVVLLGFGLNLMQVMKVGISSLPIIIMTISISLIIAYVLQKLFKLDKTIATLIGVGSS 130 
QTAWLLGFGIiNL QVM VG+ SLPIII TI+ +L++AY LQK +LD ATL+GVGSS 
QTAWLLGFGIjNLTQVMAVGMQSLPIIISTIATALLVAYGLQKWLRLDVNTATLVGVGSS 126 

Query: 131 ICGGSAIAATAPVINAKDDEVAQAISVIFLFNILAALIFPTLGNFIGLSDHGFALFAGTA 190 

ICGGSA+AATAPVI AKDDEVA+AISVIFLFN+LAAL+FP+LG +GLS+ GFA+FAGTA 
Sbjct: 127 I CGGSAVAATAPVIKAKDDEVAKAISVIFLFNMLAALLFPSLGQLLGIjSNEGFAI FAGTA 186 

Query: 191 VM3TSSVTATATAWDAINHSNTLGGATIVKLTRTLAIIPITIVLSIYHMKQTQ---KEQS 247 

VNDTSSVTATATAWDA++HSNTL GATIVKLTRTLAI+PIT+ LS+Y K+ E++ 
Sbjct: 187 VNDTSSVTATATATOALHHSNTLIX3ATIVKLTRTLAILPITLGLSLYRAKKEHDIVTEEN 245 

Query: 248 VSVTKIFPKFVTjYFIIASLLTTIVASLGFSLRIFEPLKVLSKFFIVMAMGAIGINTNVSK 307 

S+ K FP+F+L+F+LASL+TT++ SLG S F LK LSKFFIVMAM AIG+NTN+ K 
Sbjct: 247 FSLRKSFPRFILFFLIASLITTLMTSLGVSADSFHYLKTLSKFFIVMAMAAIGLNTNLVK 306 

Query: 308 LIKTGGKSILLGAACWLGIIIVSLTMQAILGTW 340 

LIKTGG++ILLGA CW+ I +VSL MQ LG W 
Sbjct: 307 LIKTGGQAI LLGAI CWVAITLVSLAMQLSLGIW 339 



A related GBS gene <SEQ ID 8889> and protein <SEQ ID 8890> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 22.17 



GvH: Signal Score (-7.5): -0.429999 
Possible site: 41 



5 




have a cleavable N-term signal seq. 












ALOM program 


count: 8 value: -7 


91 threshold: 


0.0 










INTEGRAL 


Likelihood = -7.91 


Transmembrane 


94 


110 


86 






INTEGRAL 


Likelihood = -7.75 


Transmembrane 


154 


170 


150 


176) 






Likelihood = -7.11 


Transmembrane 


316 


332 


312 


339) 


10 


INTEGRAL 


Likelihood = -6.16 


Transmembrane 


258 


274 


253 


278) 




INTEGRAL 


Likelihood = -2.71 


Transmembrane 


218 


234 


217 


234) 




INTEGRAL 


Likelihood = -1.49 


Transmembrane 


2B6 


3D2 


283 


302) 




INTEGRAL 


Likelihood = -0.96 


Transmembrane 


73 


89 


73 


89) 




INTEGRAL 


Likelihood = -0.27 


Transmembrane 




137 




137) 


15 


PERIPHERA] 


j Likelihood = 3.29 


175 












modified ALOM score: 2.08 













*** Reasoning Step: 3 

20 Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



25 The protein has homology with the following sequences in the databases: 

ORF02059(334 - 1284 of 1620) 

EGAD | 10465 |EC2158 (17 - 335 of 349) hypothetical 36.9 kd protein in lysp-nfo intergenic 
region {Escherichia coli} OMNI |NT01EC2574 conserved hypothetical protein 
SP|P33019|YEIH_ECOLI HYPOTHETICAL 36.9 KDA PROTEIN IN LYSP-NFO INTERGENIC REGION. 
30 GP|405879|gb|AAA60511.l| |U00007 yeiH {Escherichia coli} GP 1 1788482 | gb |AAC75219 . 1 1 |AE000305 

orf, hypothetical protein {Escherichia coli} PIR|E64984 |E64984 hypothetical 36.9 kD protein 
in lysP-nfo intergenic region - Escherichia coli (strain K-12) 
%Match =12.7 

%Identity =32.3 %Similarity =57.1 
35 Matches = 103 Mismatches = 125 Conservative Sub.s = 79 

270 300 330 360 390 435 462 

YSGPLSVFLSRFKACDIIVNWRTIMLFKEKIPGLILCFIIAIPSWLLGLYLPLI GAPVFAILIGIIVG-SFYQN 

MM I : I MM: | :| | |:|: = :1 : | : 

40 MTNITLQKQHRTLWHFIPGLALSAVIT-GVALWGGSIPAVAGAGFSALTLAILLGMVLGNTIYPH 
10 20 30 40 50 60 

489 519 549 579 609 636 666 696 

R-QLFNKGIAFTSKYILQTAWLLGFGLNLMQVMKVGISSLPIIIMTISISLI IAYVL-QKLFKLDKTIATLIGVGSSIC 
45 : : |: | : | : | : ::| || | : |: MM = I -Ml = = = M I MM Ml = III Mill 

IWKSCDGGVLFAKQYLLRLGIILYGFRLTFSQIADVGISGIIIDVLTLSSTFLLACFLGQKVFGLDKHTSWLIGAGSSIC 
80 90 100 110 120 130 140 

726 756 786 816 840 870 900 930 

5 0 GGSAIAATAPVINAKDDEVAQAI SVI FLFNI LAALI FPTLGNFIG— LSDHGFALFAGTA VNDTSS VTATATAWDAINHS 

I M= II 11= 1= M l«: . M M = = M = - M I - |= | = : : 1 | | || 

GA3^VIATEPWKAEASKVTVAVATWIFGTVAIFLY?AIYPLMSQV5FSPETFGIYIGSTVHEVAQVVA AGHAI-SP 

160 170 180 190 200 210 220 

55 960 990 1020 1050 1077 1107 1134 1164 

NTLGGATIVKLTRTLAIIPITIVLSIYHMKQTQKEQSVSVTKI-FPKFVLYFILASLLTTIVASLGF-SLRIFEPLKVLS 
= I I 1= I = = I MM Ml I Mlll = IM = = = M : =11 

DAENAAVI SKMLRVMMLAPFLILLAA- RVKQLSGANSGEKSKIT I PWFAILFI WAI F NSFHLLPQSWNMLVTLD 

230 240 250 260 270 280 290 

60 

1194 1224 1254 1284 1314 1344 1374 1404 

KFFIVMAMGAIGINTNVSKLIKTGGKSILLGAAOTLGIIIVSLTMQAILGTW*SCLKLNICNRFHKCYNEDIKRREHYGI 

I :: III 1 = 1= MM I I I I = M = M = 
TFLIAMAMAALGLTTHVSALKKAGAKPLLMALVLFAWLIVGGGAINYVIQSVIA 
65 310 320 330 340 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1729 

A DNA sequence (GBSxl834) was identified in S.agalactiae <SEQ ID 5373> which encodes the amino 
5 acid sequence <SEQ ID 5374>. Analysis of this protein sequence reveals the following: 
Possible site: 57 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.93 Transmembrane 7 - 23 ( 1-27) 

10 Final Results 

bacterial membrane Certainty=0 . 5373 (Affirmative) < suco 

bacterial outside Certainty=0.0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 {Not Clear) < suco 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5375> which encodes the amino acid 
sequence <SEQ ID 5376>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

20 >» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.34 Transmembrane 22 - 38 ( 13 - 42) 

Final Results 

bacterial membrane Certainty=0 . 7538 (Affirmative) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 56/215 (26%) , Positives = 111/215 (51%) , Gaps = 5/215 (2%) 

Query: 7 VFLTVLVLILIVGAGGLYFWNNHQSLEGICWRTVSLEKQVEKEIEQQLGSQAADMGISAAD 66 

+F+ ++ LIL+ G+ + N+ S+EG WRT S+++++ + ++L I + 

Sbjct: 22 LFVFIIFLILLAVLFGVRYRNS--SIEGIWRTTSIDQKLGDDFAKRLTGLHQSPLIDDS- 78 

35 

Query: 67 LWGANMH^KNDEAKITVTAQIDEVKFHQAIKTFIDKALEKQLKDQGLTYNDLSEAGK 126 

L+ + M + VKN+ ++ + Q++ F + + + L K LK+ L DLS + 
Sbjct: 79 LLTSSQMILTWNOTIVDLSFSVQVERDIFVKRLAAYHQNELLKTLKENHLVVGDLSSKER 138 

40 Query: 127 KIFDETKITDQQIDQQIDRSFQSAAQAAGGKYNTKTGEMTLPVMDGKVHRLTSVIKV-SH 185 

+1 + + +++ +D++F+ A GGKYN TG ++ V+ GKV+R+ I + 

Sbjct: 139 QIIENSMPASHELEMILDQAFEKLASQIGGKYNQKTGHLSAWLKGKVNRILHTIDIKEE 198 

Query: 186 INKKANAFYGNIVKNGEKTAYKKEGSKL-ILGNEK 219 
45 " + +F ++ Y + G KL +LG+EK 

Sbjct: 199 VAAGHTSFSKGLLTPNGYFDYTRFGKKLELLGDEK 233 

SEQ ID 5374 (GBS288) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 59 (lane 3; MW 53.7kDa). 
50 GBS288d was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 154 (lane 8-10; MW 26kDa) and in Figure 183 (lane 3; MW 26kDa). It was also expressed 
in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 187 (lane 11; 
MW 51kDa). Purified GBS288d-GST is shown in lane 8 of Figure 237. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1730 

A DNA sequence (GBSxl835) was identified in S.agalactiae <SEQ ID 5377> which encodes the amino 
5 acid sequence <SEQ ID 5378>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 3385 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
1 5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1731 

A DNA sequence (GBSxl836) was identified in S.agalactiae <SEQ ID 5379> which encodes the amino 
20 acid sequence <SEQ ID 5380>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.37 Transmembrane 67 - 83 ( 63 - 89) 
INTEGRAL Likelihood = -3.72 Transmembrane 139 - 155 ( 137 - 158) 
25 INTEGRAL Likelihood = -1.54 Transmembrane 115 - 131 ( 114 - 131) 

Final Results 

bacterial membrane --- Certainty=0 .5946 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10905> which encodes amino acid sequence <SEQ ID 
10906> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1732 

A DNA sequence (GBSxl837) was identified in S.agalactiae <SEQ ID 5381> which encodes the amino 
40 acid sequence <SEQ ID 5382>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 4709 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1733 

A DNA sequence (GBSxl838) was identified in S.agalactiae <SEQ ID 5383> which encodes the amino 
acid sequence <SEQ ID 5384>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2191 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < euco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MTTFLGNPVTFTGKQLQVGDIAKDFLLIATDLSQKSDKDFEGKKKVISWPSIDTGICSK 60 

MTTFLGNPVTFTGKQLQVGD A DF L ATDLS+K+L DF GKKKV+S++PSIDTG+CS 
Sbjct: 1 MTTFLGNPVTFTGKQLQVGDTAHDFSLTATDLSKKTLADFAGKKKVLSIIPSIDTGVCST 60 

Query: 61 QTRTFNEELSELDNTWITVSMDLPFAQKRWCSAEGLDNVILLSDFYDHSFGQEYALLMN 120 

QTR KN+ELS+bDNTWITVS+DLPFAQ +WC+AEG++N ++LSD++DHSFG++YA+L+N 
Sbjct: 61 QTRRFNQELSDLDNTWIWSVDLPFAQGKWCAAEGIENAVMLSDyFDHSFGRDYAVLIN 120 

Query: 121 EWHLLTRAVLILDEHNKVTYTEYVDNVNSDVDYEAAINAAKIL 163 
EWHLL RAVXi+LDE+N VTY EYVDN+N++ DY+AAI A K L 
, Sbjct: 121 EWHLIARAVLVLDENNTVTYAEYVDNINTEPDYDAAIAAVKSj 163 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1734 

A DNA sequence (GBSxl839) was identified in S.agalactiae <SEQ ID 5385> which encodes the amino 
acid sequence <SEQ ID 5386>. This protein is predicted to be DNA alkylation repair enzyme. Analysis of 
this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4729 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=D . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB40581 GB:AJ010128 DNA alkylation repair enzyme [Bacillus 
50 cereus] 

Identities = 67/217 (30%), Positives = 119/217 (53%), Gaps = 5/217 (2%) 
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Query: 


6 


Sbjct: 


7 


Query: 
Sbjct: 


66 
67 




124 


Sbjct: 


127 


Query: 


183 


Sbjct: 





SLERKFKAASDKEVSKQQEAYLRHHFKCYQIKSPERRMEjYKELIKAAICRQAKIDWQLLDK 65 
+L+ FA + E ++ Y+++HF GI++PERR L K++I+ + D+Q++ + 

ALQEHFIANQNPEKAEPMARYMKNHFPFLGIQTPERRQLLKDVIQIHTLPDQKDFQVIVR 65 

-CWQSDYREYHHFVLDYLLAMSQFLTYNECSRLEFYARHQQWWDSIDVLTKIF-GNLSLK 123 

W RE+ LD + • + LE + WWD++D + F GN+ L+ 

ELWDLPEREFQAAMjDMMQKYKMHINETHIPFLEELITOKSWMDTVDSIVPTFLGNIFLQ 126 



t W++R AI QL +K+K + ++L I + S+EFFI KAIG 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1735 

A DNA sequence (GBSxl841) was identified in S.agalactiae <SEQ ID 5387> which encodes the amino 
acid sequence <SEQ ID 5388>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2117 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA81648 GB:Z27121 unknown [Mycoplasma hominis] 
Identities = 67/281 (23%) , Positives = 113/281 (39%) , Gaps = 52/281 (18%) 

FVFDIDGTLCFDGMS--LSKEIQGILERAQIDYGHRVTFATARSYRDT1GILGDKLSLSK 60 
F D+DGTL D + + + + +++A + GH V+ T R +R T+ + +KL L+ 
Sbjct: 14 FAIDLDGTLIJffiSANGTVHPKTEEAIKKA-VAQGHIVSIITGRPWRSTLPVY-EKLGIiMA 71 • 

Query: 61 I IG-LNGATLHENGHLVDSYYLQSDFFSTI ISYCHRHQI PYFVD EVFNYATYQA 113 

I+G NGA +H FF I+Y 4++ Y + E+ NYA 

Sbjct: 72 IVGNYNGAHIHNPA DPF F I PAI T YLDLNEVLY I LGDEKVKKEITNYAIEGP 122 

Query: 114 SKIPFIAYVDPQ KRGELLEVSKIE KPIKMVLYFGDQLGR 152 

+ ++DP KE + + KI KPVL LR 

: 123 DWVQLM-HRDPNLERVFGFNQATKFRECINL3KIPLKPTGIVFDVKPDTDVLELLTYLKR 181 

Query: 153 ADQMLAELNRFGLSSHFFHEFEKCLYINPIAVDKGKATKKLFG NRFIAFGNDKN 206 

F+ II +DKGK + + +A G+ N 

2 RYGDLGEFSSWSKGEGLSPVFD ITSIGIDKGKVISLIMRYYNIDIDDTVAMGDSYN 237 

Query: 207 DISMFDAAHYSVQVGDFDELTPYANLRVSRESVHEGITTLF 247 
D+SM++ A+ V + + L + V +++ EG F 
: 238 DLSMYNVANVCVSPANAEPLIKKMSTVVI1KQrNI<EGAVGYF 278 : 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Sbjct: 
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207 


Sbjct: 
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Example 1736 

A DNA sequence (GBSxl842) was identified in S.agalactiae <SEQ ID 5389> which encodes the amino 
acid sequence <SEQ ID 5390>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2383 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB90005 GB:AE001018 A. fulgidus predicted coding region AF1244 
[Archaeoglobus fulgidus] 
Identities = 22/48 (45%) , Positives = 35/48 (72%) 

Query: 150 GKSIGELNVWHQTGATIVAIEHEGKFIVSPGPFSVIEQGDHIFFVGDE 197 

GKSIGEL + +TGAT++A+ + K I+SP P +V+E GD + +G++ 
Sbjct: 102 GKSIGELGIRSKTGATVIAVLKKEKTIISPSPETVL3PGDKVVVIGEK 149 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5391> which encodes the amino acid 
sequence <SEQ ID 5392>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2446 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 163/213 (76%), Positives = 196/213 (91%) 





1 


Sbjct: 






61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 






Sb j Ct : 


181 



DL+ILTLKHGSGAI+LSKE+AIEF+NQYE++HS+A+LK KIR+ I Q + ME++A LV+ 



DFL+Q+++VSKQYPIiAPYEII ++DSEH GKSIG LN+WHQTGATI VAIEH G+FIVSPG 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1737 

A DNA sequence (GBSxl844) was identified in S.agalactiae <SEQ ID 5393> which encodes the amino 
acid sequence <SEQ ID 5394>. This protein is predicted to be gls24. Analysis of this protein sequence 
reveals the following: 

Possible site: 16 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2855 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9361> which encodes amino acid sequence <SEQ ID 9362> 
was also identified. 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA8S383 GB:U23376 putative 20-kDa protein [Lactococcus lactis] 
Identities = 63/124 (50%) , Positives = 84/124 (66%) 

Query: 1 MSGGFFSNIiKNSVVNSDSVTDGVNVEVGTKEVAVDLDIVVEYGICDIPAIVESIKAIVSQN 60 
15 + GGFFSNL +4-N+D VT GV+VEVG +VAVDI1 +V EY K++P I E IK ++ + 

Sbjct: 55 vEGGFFSNLTGKLIOTDDvTTGVDVEVGKTQVAVDLKATVTEYRKlWPDIYEKIKEVIRKE 114 

Query: 61 VEVMTHLKVVELNANVVDIKTKAEHEADSVTVQDRVSDAACATGNFASEQAGKAKAAISS 120 
V MT L+WE+M V DIKTK + + D V++QDRV+ AAQ TG F SEQ K K + 
20 Sbjct: 115 VAAMTELEVvEVNVTVTDIKTKEQQKEDDVSIQDRVTSAAQTTGKFTSEQVDKVKDKVED 174 

Query: 121 GAEK 124 
+K 

Sbjct: 175 NTDK 178 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5395> which encodes the amino acid 
sequence <SEQ ID 5396>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 .2534 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/137 (68%) , Positives = 108/137 (78%) , Gaps = 8/137 (5%) 

Query: 1 MSGGFFSNLKNSVVNSDSVTDGVNVEVGTKEVAVDLDIVVEYGKDIPAIVESIKAIVSQN 60 

+ +GGFFSN+ KN+ +VNS + SVTDGV+VEVG+ KEVAVDL I+VEYGKDIPAI ESIKAIVSQN 
Sbjct: 35 WGGFFSNIKNNLWSESVTDGVSVEVGSKEVAVDDAIIVEYGKDIPAIAESIKAIVSQN 94 

Query: 61 vEVMTHLranffil^ANVVDIKTK^ 120 

V+ MTHDKWE+N NWDI+TK EHEA SVTVQDRV+ AA +T F SEQ K K IS 
Sbjct: 95 vDSMTHLKWEVNVNVVDIRTKEEHFAASVTVQDRTC 154 

Query: 121 GAEKTKEAVSNGTEAAK 137 
N EAAK 

Sbjct: 155 TVNSDEAAK 163 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1738 

A DNA sequence (GBSxl845) was identified in S.agalactiae <SEQ ID 5397> which encodes the amino 
acid sequence <SEQ ID 5398>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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bacterial cytoplasm Certainty= 0.3 3 93 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
1 0 vaccines or diagnostics. 

Example 1739 

A DNA sequence (GBSxl846) was identified in S.agalactiae <SEQ ID 5399> which encodes the amino 
acid sequence <SEQ ID 540O. Analysis of this protein sequence reveals the following: 

Possible site: 16 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3168 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1740 

A DNA sequence (GBSxl847) was identified in S.agalactiae <SEQ ID 5401> which encodes the amino 
acid sequence <SEQ ID 5402>. This protein is predicted to be gls24. Analysis of this protein sequence 
reveals the following: 

30 Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2718 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 18 WGELTFEDKVIEKIVGIAIEHVDGLLAWGGFFSNLKNSVWSDSVTDGVNVEVGKKQV 77 

++G LT+EDKV++KIVG+A+E VDGLL+V GGFFSNL ++H+D VT GV+VEVGK QV 
Sbjct: 27 IKGALTYEDKWQKIVGLALESyDGLLSVEGGFFSNIjTGKLINTDDVTTGVDVEVGKTQV 86 

Query: 78 AVDLDIVAEYQKHVPTIFADIIOCWEAEVia^TDLEvVEVNVNVVDIKTRAQHEEDSVT 137 

AVDL +V EY+K+VP 1+ IK+V+ EV MT+LEWEVNV V DIKT+ Q +ED V++ 
Sbjct: 87 AVDLICVVTEYRI<lWPDIYEKII<3ViraCEVAAri?ELEVvEVWrVTDIKTKEQQKEDDVSI 146 
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Sbjct: 147 QDRVTSAAQTTGKFTSEQVDKVKDKVEDNTDKEARVK 183 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5403> which encodes the amino acid 
sequence <SEQ ID 5404>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3396 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 123/180 (58%) , Positives = 158/180 (87%) , Gaps = 1/180 (0%) 

Query: 1 MTETYIKNTTNNSGTTAWGELTFEDKVIEKIVGIAIEHVDGLLAVNGGFFSNLKNSVVN 60 

MTETYIKNT+ + T+A+RG+LT++DKVTEKIVG+A+E+VDGLL VNGGFF+NLK+ +VN 
Sbjct: 1 MTETYIKNTSKDL-TSAIRGQLTYDDKVIEKIVGLALENVDGLLGVNGGFFANLKDKLVN 59 

Query: 61 SDSVTDGVNvEVGKKQVAvDLDIVAEYQKHVPTIFADIKKVVE 120 

++SV DG VNVEVGKKQVAVDIjD I VAE YQKHVPTI + IK +VE EVKRMTDIi+V+EVNV 
Sbjct: 60 TESWDGVNvEVGKKQVAVDLDIVAEYQKHVPTIYDSIKSIVEEEVKRMTDLDVIEVNVK 119 

Query: 121 VVDIKTRAQHEEDSVTLQDRVTSAAQATGEFASNQVSNVKSAVGSGVDKVEDMKSEPRVQ 180 

WDIKT+ Q E + V+LQD+V+ A++T EF S+QV NVK++V +GV+K++D K+EPRV+ 
Sbjct: 120 WDIKTKEQFEAEKVSLQDKVSDMARSTSEFTSHQVENVKASVDNGVEKLQDQKAEPRVK 179 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1741 

A DNA sequence (GBSxl848) was identified in S.agalactiae <SEQ ID 5405> which encodes the amino 
acid sequence <SEQ ID 5406>. This protein is predicted to be a 6-kDa protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -9.29 Transmembrane 25 - 41 ( 23 - 52) 



Final Results 

bacterial membrane Certainty=0 . 4715 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA86382 GB:U23376 putative 6-kDa protein [Lactococcus lactis] 
Identities = 27/61 (44%) , Positives = 45/61 (73%) 

Query: 3 EFWKYRYPLGGAVIGLVLAAMIvTIGFFKTILALVIIVLGAYAGLYVQRTGMLDQFFNK 62 

++ K RYP+ G ++G ++A I TIGF+K IL L +1 LG Y GL+++++G++DQF N+ 
Sbjct: 2 DYFEKNRYPIIGGIVGALIAVCIFTIGFWKMILVLFLIGLGIYIGLFLKKSGIIDQFINR 61 

Query: 63 R 63 

Sbjct: 62 K 62 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5407> which encodes the amino acid 
sequence <SEQ ID 5408>. Analysis of this protein sequence reveals the following: 
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Possible site: 28 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.73 Transmembrane 11 - 27 ( 6 - 50) 
INTEGRAL Likelihood = -7.11 Transmembrane 33 - 49 ( 27 - 50) 

Final Results 

bacterial membrane Certainty=0. 5692 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 28/61 (45%) , Positives = 48/61 (77%) 

Query: 3 EFTOKYRYPLGGAVIGLVLAAMIVTIGFFKTILALVIIVLGAYAGLYVQRTGMLDQFFNKR 63 

EF K++YP+ G ++GL++A +++ G FKT+LA++ I+LG Y GLY ++TG++DQF N++ 
Sbjct: 2 EFYEKFKYPIIGGLVGLIIAILLMAFGLFKTLLAIIFIILGIYGGLYAKKTGIIDQFLNRK 62 

A related GBS gene <SEQ ID 889 1> and protein <SEQ ID 8892> were also identified. Analysis of thi 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Di scrim Score: 12.56 
GvH : Signal Score (-7.5): -1.11 

Possible site: 22 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -9.29 threshold: 0.0 

INTEGRAL Likelihood = -9.29 Transmembrane 25 - 41 ( 23 - 52) 
PERIPHERAL Likelihood = 12.25 44 
modified ALOM score: 2.36 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4715 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

44.3/73.8% over 60aa 
Lactococcus lactis 

EGAD | 42518 | putative 6-kDa protein Insert characterized 

GP|727435|gb|AAA86382.l| |D23376 putative 6-kDa protein Insert characterised 
ORF01006(307 - 489 of 792) 

EGAD|42618|45008(2 - 62 of 62) putative 6-kDa protein {Lactococcus 
lactis}GP| 727435 |gb|AAA86382.1 | ]U23376 putative 6-kDa protein {La 
ctococcus lactis} 
%Match = 11.6 

%Identity = 44.3 %Similarity =73.8 

Matches = 27 Mismatches = 16 Conservative Sub.s =18 

159 189 219 249 279 309 339 369 

TNVPEQLEHIQSDVELGLKEFFGLEKKMOTRVFVKQVEEEWGNAK 

I 111= I ::| ::| 
MDYFEKNRYPI IGGIVGALIAV 



MIVTIGFFKTILALVIIVLGAYAGLYVQRTGMLDQFFNI\RK*NFSFIFILHYLNKRKRNYYD*NLHQKHN*QFTODSCSW 

I Mlhl II I :| II I l|:::::|::||| |:: 
CIFTIGFWKMILVLFLIGLGIYIGLFLKKSGIIDQFINRK 
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SEQ ID 5406 (GBS14) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 9 (lane 4; MW 33.3kDa). The GBS14-GST fusion product was purified (Figure 
190, lane 8) and used to immunise mice. The resulting antiserum was used for FACS (Figure 263), which 
confirmed that the protein is immune-accessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1742 

A DNA sequence (GBSxl849) was identified in S.agalactiae <SEQ ID 5409> which encodes the amino 
acid sequence <SEQ ID 541 0>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-18.63 Transmembrane 61 - 77 ( 51 - 83) 
INTEGRAL Likelihood = -7.91 Transmembrane 10 - 26 ( 7 - 28) 

Final Results 

bacterial membrane Certainty=0 . 8451 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 541 1> which encodes the amino acid 
sequence <SEQ ID 5412>. Analysis of this protein sequence reveals the following: 

Possible site: 29 



Final Results 

bacterial membrane Certainty=0. 7474 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities = 87/193 (45%) , Positives = 127/193 (65%) , Gaps = 4/193 (2%) 

Query: 1 MSKGLKSLYTLLGLISLTLLGFVAVISKQHIYLP-SFNWLDWDFN-LPSPIDVGMYHYFF 58 

MSK LK Y L+GL+ L++ G+V 1+ +IYLP S+ WL W + P+ +D + +Y+F 
Sbjct: 9 MSKLLKISYCLVGLVLLSVFGWVVGITGGYIYLPYSYRWLSWGMDSFPNLLDSALSYYYF 63 

Query: 59 WGALVLFVIVLLAILVVLFYPRRYTEYKIiA- -DKTGKLMLKKSAIEGFVKTEVLKTGLMK 116 

W ALVLFVI LA+LV++ YPR YTE +L +K G L+LKKSAIE +V T + GLM 
Sbjct: 69 WTALVLFVITFLALLVIILYPRIYTEVQLRHKNKKGTLLLKKSAIESYVATAIQTAGLMP 128 

Query: 117 SPSvTAHLYKKKVKVDvKGLLTSRTNVPEQLEHIQSDVELGLKEFFGLESCKMNTRVFVKQ 176 

+P+VTA LYK+K + VKG L SR V +Q+ ++ +E GL EFFG+ +N +V+VK 
Sbjct: 129 NPTVTAlQjYKRKFNIIVKGRLASRVAVADQISGVTCEGIEKGLTEFFGINYPvNFKVYVKD 188 

Query: 177 VEEENVGNAKTNK 189 

Sbjct: 189 IADSDRKHITRNR 201 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



WO 02/34771 



PCT/GB01/04789 



Example 1743 

A DNA sequence (GBSxl850) was identified in S.agalactiae <SEQ ID 5413> which encodes the amino 
acid sequence <SEQ ID 5414>. Analysis of this protein sequence reveals the following: 



Possible site: 17 

»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -9.82 Transmembrane 
INTEGRAL Likelihood = -6.42 



; Certainty=0. 4927 (Affirmative) < suco 



bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12244 GB:Z99106 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

Identities = 31/76 (40%) , Positives = 48/76 (62%) 

Query: 1 MSLIWSLIVGAIIGAIAGAVTNKGGSMGWIANILAGLVGSFVGQSLLGTWGPKLAGMALI 60 

+S + SL+V +IG I A+ G +++AGL+G+++G LLGTWGP LAG A+ 

Sbjct: 2 LSFLVSLVUAIVIGLIGSAIVGNRLPGGIFGSMIAGLIGAWIGHGLLGTWGPSLAGFAIF 61 



P+I+GA I V + + 
Sbjct: 62 PAIIGAAIFVFLLGLI 77 

A related DNA sequence was identified in S.pyogenes <SEQ ID 541 5> which encodes the amino acid 

sequence <SEQ ID 541 6>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
>» Seems to have a cleavable N-term signal seq. 

Likelihood = -7.59 Transmembrane SO - 76 ( 56 - 80) 



Final Results 

bacterial membrane Certainty=0. 4036 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12244 GB:Z99106 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

Identities = 28/76 (36%) , Positives = 47/76 (61%) 

Query: 1 MGLIWTLIVGALIGVIAGALTKKGGSMGWIANIAAGLVGSSVGQALLGSWGPSLAGMSLI 60 

+ + +L+V +IG+I A+ G ++ AGL+G+ +G LLG+WGPSLAG ++ 

Sbjct: 2 LSFLVSLWAIVIGLIGSAIVGNRLPGGIFGSMIAGLIGAWIGHGLLGTOGPSLAGFAIF 61 

Query: 61 PSVIGAVIWMITSFV 76 

P++IGA 1 V + + 
Sbjct: 62 PAIIGAAIFVFLLGLI 77 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 63/82 (75%) , Positives = 74/82 (89%) 

Query: 1 MSLIWSLIVGAIIG^AIAGAVTNKGGSMGWIANIIAGLVGSFVGQSLLGTWGPKLAG^1ALI 60 

M LIW+LIVGA+IG IAGA+T KGGSMGWIANI AGLVGS VGQ+LLG+WGP LAGM+LI 
Sbjct: 1 MGLIWTLIVGALIGVIAGALTKKGGSMGWIANIAAGLVGSSVGQALLGSWGPSLAGMSLI 60 

Query: 61 PSIVGAIIWIVTSFVLGKMNN 82 

PS++GA+ IW++TSFVL K NN 
Sbjct: 51 PSVIGAVIWMITSFVLNKTNN 82 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1744 

5 A DNA sequence (GBSxl851) was identified in S.agalactiae <SEQ ID 5417> which encodes the amino 
acid sequence <SEQ ID 541 8>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -9.82 Transmembrane 88 - 104 ( 84 - 111) 
10 INTEGRAL Likelihood = -8.07 Transmembrane 29 - 45 ( 27 - 54) 

Final Results 

bacterial membrane --- Certainty=0 .4927 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <: suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12244 GB:Z99106 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

20 Identities = 29/77 (37%) , Positives = 47/77 (60%) 

Query: 31 IMGLIWSLIVGAIIGAIAGAITNKGGSMGWIANILAGLVGSFVGQSLLGTWGPKLADMAL 90 

++ + SL+V +IG I AI G +-H-AGL+G+++G LLGTWGP LA A+ 

SbjCt: 1 MLSFLVSLWAIVIGLIGSAIVGNRLPGGIFGSMIAGL1GAWIGHGLLGTWGPSLAGFAI 60 

25 

Query: 91 IPSIVGAIIVIIVTSFV 107 

P+I+GA I + + + 
Sbjct: 61 FPAIIGAAIFVFLLGLI 77 

30 There is also homology to SEQ ID 5416: 

Identities = 60/79 (75%) , Positives = 72/79 (90%) 

Query: 32 MGLIWSLIVGAIIGAIAGAITNKGGSMGWIANILAGLVGSFVGQSLLGTWGPKLADMALI 91 
MGLIW+LIVGA+IG IAGA+T KGGSMGWIANI AGLVGS VGQ+LLG+WGP LA M+LI 
35 Sbjct: 1 MGLIWTLIVGALIGVIAGALTKKGGSMGWIANIAAGLVGSSVGQALLGSWGPSLAGMSLI 60 

Query: 92 PSIVGAIIVIIVTSFVLGK 110 

PS++GA+IV+++TSFVL K 
Sbjct: 61 PSVIGAVIWMITSFVLNK 79 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1745 

A DNA sequence (GBSxl852) was identified in S.agalactiae <SEQ ID 5419> which encodes the amino 
45 acid sequence <SEQ ID 5420>. This protein is predicted to be ATP-dependent DNA helicase Rep (uvrD). 
Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 . 1364 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9863> which encodes amino acid sequence <SEQ ID 9864> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

:>GP:AAD51119 GB:AF176554 DNA helicase PcrA [Leuconostoc citreum] 
Identities = 414/764 (54%), Positives = 537/764 (70%), Gaps = 23/764 (3%) 

VEMNPLIIGMNDKQAEAVQTTDGPLLIMAGAGSGKTRVLTHRIAYLIDEKYVNPWNILAI 65 
+ + L GMN+KQAEAVQTT+GPLLIKAGAGSGKTRVLTHRIA+L+ + V PW ILAI 
MSVETLTNGMNNKQAEAVQTTEGPLLIKAGAGSGKTRVLTERIAHLVQDLNVFPWRILAI 60 



RTLMKR+I LNLDT +++ R+ILG ISNAKND+L 



Query: 


6 


Sbjct: 


1 


Query: 


66 


Sb j ct : 


61 


Query: 


124 


Sbjct: 


121 ' 


Query: 


184 


Sbjct: 


181 


Query: 


244 


Sb j ct : 


241 


Query: 


304 


Sbjct: 


301 


Query: 


364 


Sbjct: 


361 


Query: 


424 


Sbjct: 


421 


Query: 


483 


Sbjct: 


481 




543 


Sb j Ct : 


539 




603 


Sbjct: 


599 




661 


Sbjct: 






720 


Sb j ct : 


704 



EL+RS+++DFDDLIM+T+ LF DVLA YQQ+++Y+HVDEYQDTN AQY +V LLA 



R KN+ WGDADQS I YGWRGA+M NIL+FEKDYP A V+LE+NYRST+ IL AAN VIN 



HN R PKKLWT+N +G+QI Y+RA E +EA P+ S I + + + DFAVLYRTN 



AQSR IEE+L+K 1 N+PY+MVGG KFY RKEI D++AY++++ N DN +FER+VNEPKRG 



+G +L ++R A ++S 4 



h ++GY + L +N +SQAR+EN+E3FLSVTK FDD 



FL AL++D DD VTLMTLHAAKGLE FP WFL I G+ +EG+ FPLSRA+ DDI 



EEERRLAYVGITRA + LFLTNA +R L+G+T N P+RFI EI EL++ Y GL+R 



+HKICWG GTV+ VSG QELK+ FP G+K+IiA+ 1 



A related DNA sequence was identified in S. pyogenes <SEQ ID 542 1> which encodes the amino acid 
sequence <SEQ ID 5422>. Analysis of this protein sequence reveals the following: 

^-terminal signal sequence 
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Final Results 

bacterial cytoplasm --- Certainty=0. 02 14 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 622/772 (80%), Positives = 699/772 (89%), Gaps = 15/772 (1%) 

Query: 8 MNPLIIGMITOKQAEAVQTTDGPIiIjIMAGAGSGKTRVTjTHRIAYLIDEKYVNPWNILAITF 57 

MNPL+ GMtro+QA+AVQTT+GPLLIMAGAGSGKTRVLTHRIAYLIDEK+VNPWNILAITF 
Sbjct: 1 MNPLIiNGMNDRQAQAVQTTEGPLLIMAGAGSGKTRVLTfffilAYLIDEKFVNPWNILAITF 50 

Query. 68 TNKAAREMRERAIAU.1PA.TQDTLIATFHSMCWILRREADYIGYNR1TFTIVDPGEQRTLM 127 

TNI<AAREM+ERA+ALNPAT+DTLIATFHSMCTVRILRRFJU3+IGYNRNFTIVDPGEQRTLM 
Sbjct: 61 TNKAAREMKERALALNPATKDTLIATFHSMCWILRRE^ffiHIGYNRNFTIVDPGEQRTLM 120 

Query: 128 KRI 1 KQLNLDTKKWNERS I LGTI SNAKNDLLDE IAYEKQAGDMYTQVI AKCYKAYQEELR 187 

KRI+KQLN+D KKWNERS 1 LGTI SNAKNDLLDE YE QA DMY+Q+ +A+ CYKAYQEELR 
Sbjct: 121 KRILKQLNIDPKKWNERSILGTISNAKNDLLDEKGYEAQAADMYSQIVARCYKAYQEELR 180 

Query: 188 RSEAMDFDDLIMMTLRLFDQNKDVLAYYQQRYQYIHVDEYQDTNHAQYQLVKLLASRFKN 247 

RSEA+DFDDLIMMTLRLFD N DVLAYYQQRYQYIHVDEYQDTNHAQYQL+KLLASRFKH 
Sbjct: 181 RSEALDFDDLIMMTLRLFDANPDVLAYYQQRYQYIHVDEYQDTNHAQYQLIKLLASRFKN 240 

Query: 248 ICTVGDADQSIYGWRGADMQNILDFEKDYPQAKVVLLEFjNYRSTKKILQAANNVINHNKN 307 

ICWGDADQSIYGWRGADMQNILDFEKDYP AKWLLEENYRSTKKILQAAN+VIN+N+N 
Sbjct: 241 ICWGDADQSIYGWRGADMQNILDFEKDYPDAKOTLLEENTOSTKKILQA.ANDVINNNRN 300 

. Query: 308 RRPKKLWTQiroEGEQIVYHRANNEQEEAVFVASTIDNIvREQGKNFroFAVLYRTNAQSR 367 
RRPKKLWTQN +GEQ+VY+RAN+E++EAVFVASTI N+ +E GKNFKDFAVLYRTNAQSR 
Sbjct: 301 RRPKKLWTQNADGEQLVYYRANDERDEAVFVASTISNMSQELGKNFKDFAVLYRTNAQSR 350 

Query: 368 TIEEALLKSNIPYTW/GGTKFYSRKEIRDVIAYLNILAOTSDNISFERIVNEPKRGVGPG 427 

TIEEALLKSNIPYTMVGGTKFYSRKEIRD+IAYL I+AN +DNISFERIVNEPKRGVGPG 
Sbjct: 351 TIEEALLKSNIPYTMVGGTKFYSRKEIRDLIAYLTIVANPADNISFERIVNEPKRGVGPG 420 

Query: 428 TLEKIRSFAYEQSMSLLDASSNVMMSPLKGKAAQAWTOLANLILTLRSNLDSLTVTEITE 487 

TL+K+R FAYE SLL+A+ SN+ +MSPLKGKAAQA+ DLAN++ LR +LD +++T++ E 
Sbjct: 421 TLDKLRQFAYESDQSLLEAASNLLMSPLKGKAAQAIMDLANILGQLRQDLDQMSITDLAE 480 

Query: 488 NLLDKTGYLEALQVQNTLESQARIENIEEFLSVTKNFDDNPEITVEGETGLDRLSRFLND 547 

LL+KTGYL++L++QNTLESQARIENIEEFLSVTKNFD++ E ETG+DRL RFLND 

Sbjct: 481 ALLEKTGYDDSLRLQNTLESQARIENIEEFLSVTKNFDESSASQEEDETGVDRLGRFLND 540 

Query: 548 LALIADTDDSATETAEVTLMTLHAAKGLEFPWFLIGMEEGVFPLSRAIEDADELEEERR 607 

LALIADTDDS E AEVTLMTLHAAKGLEFPWFLIGMEEGVFPLSRA ED DELEEERR 
Sbjct: 541 LALIADTDDSQAEAAEVTLMTLHAAKGLEFPWFLIGMEEGVFPLSRASEDPDELEEERR 600 

Query: 508 LAYVGITRAEQILFLTNANTRTLFGKTSYNRPTRFIREIDDELIQYQGLARPVNSSFGVK 667 

LAYVGITRAE++LF+TNANTRTLFGK+SYNRPTRF4-+EI +EL+ Y+GLARP SSFGV+ 
Sbjct: 601 LAWGITRAEEVLFMTNANTRTLFGKSSYNRPTRFLKEISEELLSYKGLARPAQSSFGVR 660 

Query: 568 YSKEQPTQFGQGMSLQQALQARKSNSQSQVTAQ-LQA LNANNS-HET 712 

+S E TQFGQGMSL +ALQARK+ +Q + +AQ +QA +N+S E 

Sbjct: 661 FSTETHTQFGQGMSLSEALQARKAQAQVRQSAQPMQAHTIPSASTSSVLPFGSNSSVEEV 720 

Query: 713 SWEIGDVATHKKWGDGTVLEVSGSGKTQELKINFPGIGLKKLLASVAPISKK 764 

+W+IGD+A HKKWGDGTVLEVSGSGKT ELKI FP +GLKKLLASVAPI KK 
Sbjct: 721 TWQIGDIAHHKKWGDGTVLEVSGSGKTKELK1KFPEVGLKKLLASVAPIEKK 772 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1746 

A DNA' sequence (GBSxl853) was identified in S.agalactiae <SEQ ID 5423> which encodes the amino 
acid sequence <SEQ ID 5424>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4741 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 .0000 (Wot Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88579 GB:M14339 unknown [Streptococcus pneumoniae] 
Identities = 43/57 (75%) , Positives = 50/57 (87%) 

15 

Query: 41 AHGGYLFTLCDQVSGLVAISTGYEAVTLQSNINYLRAGRLDDLLTVIGTCVHNGRTT 97 

AHGGYLFTLCDQ+SGLV IS G + VTLQS+INYL+AG+LDD+LT+ G CVH GRTT 
Sbjct: 1 AHGGYLFTLCDQISGLWI SLGLDGVTLCS S INYLKAGKLDDVLT I KGECVHQGRTT 57 



20 A related DNA sequence was identified in S.pyogenes <SEQ ID 5425> which encodes the amino acid 
sequence <SEQ ID 5426>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

25. Final Results 

bacterial cytoplasm Certainty=0 . 1210 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 57/97 (58%) , Positives = 74/97 (75%) 



Query: 2 KFNLEQVTCVFENYEIENWEEGQVTLTTKVVDSSIiNYYGNAHGGYIiFTLCDQVSGLVAIST 61 

+ L + +F+NY+IE E+G + L+T+V +++I1NYYGNAHGGYLFTLCDQV GLVA +T 
Sbjct: 7 EMTLNVISIFDNYQIELAEKGHLILSTEVTETALNYYGNAHGGYLFTLCDQVGGLVARTT 66 

Query: 62 GYEAVTLQSNINYLRAGRLDDLLTVIGTCTHNGRTTK 98 

G E+VTLQ+N NYL+AG D L V G VH GRTT+ 
Sbjct: 67 GvESVTLQANANYLKAGHKGDKLMVEGRLVHGGRTTQ 103 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1747 

A DNA sequence (GBSxl854) was identified in S.agalactiae <SEQ ID 5427> which encodes the amino 
45 acid sequence <SEQ ID 5428>. Analysis of this protein sequence reveals the following: 
Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .3187 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
55 No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this .protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1748 

A DNA sequence (GBSxl855) was identified in S.agalactiae <SEQ ID 5429> which encodes the amino 
5 acid sequence <SEQ ID 543 0>. This protein is predicted to be uracil permease (uraA). Analysis of this 
protein sequence reveals the following: 



Possible site: 54 
»> Seems to have no N-terminal s: 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 



■ Final Results 

bacterial membrane - 
bacterial outside - 



ignal sequence 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 138 ( 117 - 

• 228 ( 204 - 
■ 76 ( 49 - 

- 165 ( 145 - 

• 418 ( 401 - 

- 438 ( 420 - 

- 381 ( 364 - 

• 200 ( 182 - 

• 362 ( 345 - 

• 276 ( 260 - 



Certainty=0 .4461 (Af f irmatili 
-- Certainty=0. 0000 (Not Clear) . 



bacterial cytoplasm Certain^y=0 . 0000 (Not Clear) . 



25 A related GBS nucleic acid sequence <SEQ ID 9865> which encodes amino acid sequence <SEQ ID 9866> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA53697 GB:X76083 uracil permease [Bacillus caldolyticus] 
Identities = 208/416 (50%), Positives = 291/416 (69%), Gaps = 11/416 (2%) 

30 

Query: 32 LLDIDEKPELFQGLLLSFQHVFAMFGATILVPLILGMPVSVALFASGCGTLIYQVATKFK 91 

+LDI ++P + Q + LS QH+FAMFGATILVP ++S+ S+AL SG GTL -I- + TK++ 
Sbjct: 5 VLDIQDRPTVGQWITLSLQHLFAMFGATILVPYLVGLDPSIALLTSGLGTLAFLLITKWQ 64 

35 Query: 92 VPVYLGSSFAYITAMALAMKQMHGDISAAQTGILFVGLIYvWATVIKFVGNSWVDKILP 151 

VP YLGSSFAYI + A + G AA G GL+Y TO +IK G WV K+LP 
Sbjct: 65 VPAYLGSSFAYIAPIIAA--KTAGGPGAAMIGSFLAGLVYGWALIIKKAGYRWVMKLLP 122 

Query: 152 PIIIGPMIIVIGLGLANSAVTNA--GFVAKGDWRKMLVAVVTFLIAAFINTKGKGFIKII 209 
40 P+++GP+IIVIGLGLA +AV AG K VA+VT + +G + +1 

Sbjct: 123 PWVGPVIIVIGLGLAGTAVGNmiWGPDGKYSLLHFSVALVTLAATIVCSVLARGMLSLI 182 

Query: 210 PFLFAIIGGYILSIILGLVDLSPVEKAAWFELPKFYLPFKTGLFHSYKLYFGPEMLAIIj- 268 
PL 1+ GY+ ++ +GLVDLS V A WFE P F +PF Y + E++ ++ 

45 Sbjct: 183 PVLVGI WGYLYALAVGLVDLSKVAAAKWFEWPDFLI PEA DYPVRVTWE I VMLMV 237 

Query: 269 PISIVTIAENIGDHTVLGQICGRKFLKKPGLNRLLIGDGLATAFSALIGGPAETTYGENT 328 

P+4-IVT++E+IG VL ++ GR+ ++KPGL+R ++GDG AT SAL+GGP +TTYGEN 
Sbjct: 238 PVAIVTLSEHIGHQLVLSKWGRDLIQKPGLHRSILGDGTA"MISALLGGPPKTTYGENI 297 

50 

Query: 329 GVIGMTRIASVTVIRNAAFIAIAFSFFGKFTALISTIPSAVLGGMAILLYGVIASNGLKV 388 

GV+ +TR+ SV V+ AA IAIAF F GK TALIS+IP+ V+GG+-HLL+G+IAS+GL++ 
Sbjct: 298 GVLAITRVYSVYVLAGAAVIAIAFGFVGKITALISSIPTPVMGGVSILLFGIIASSGLRM 357 

55 Query: 389 LIENRVNFAEVRNLI IASSMLVLGLGGAVLDLG-ALTLSGTALSAIVGI ILNLILP 443 

LI++RV+F + RNL+IAS +LV+G+GGAVL + + ++G ALSAIVG++LNLILP 
Sbjct: 358 LIDSRVDFGQTRNLVIASVILVIGIGGAVLKISDSFQITGMALSAIVGVLLNLILP 413 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 543 1> which 
sequence <SEQ ID 5432>. Analysis of this protein sequence reveals the following: 



3 N- terminal signal sequence 



Likelihood = 
Likelihood = 
Likelihood -- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



91 



Transmembrane 396 



• 193 ( 171 - 206 

■ 329 ( 304 - 339 
170 ( 152 - 175 

• 392 ( 374 - 395 

• 41 ( 22 - 43 

■ 136 ( 116 - 142 

■ 112 ( 90 - 117 
355 ( 338 - 360! 

• 412 ( 396 - 413 



■ Final Results 

bacterial membrane Certainty=0. 5288 (Affirmative) . 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:CAB89870 GB:AJ132624 uracil transporter [Lactococcus lactis] 
Identities = 294/421 (69%), Positives = 359/421 (84%), Gaps - 5/421 (1%) 

DVIYDVEEVPKAGMLVGLSFQHLFAMFGATVLVPILVGIDPSVALLSSGLGTLAHLSOTK 6 2 
D+I V+E P A GLS FQHLFAMFG+TVLVPILVGI + P+ +ALLS SGLGTLAH+ SVTK 
DIILKVDEKPAASQWFGLSFQHLFAMFGSTVLVPILVGINPAIALLSSGLGTLAHMSVTK 64 

FK1PAYMGSSFAYIAAMQLLMKTNGIGAVAQGAMTGGLVYLIVALIVKAIGMDWIDNILP 122 
FK+PAYMGSSFAYI AM LLMK G+ A+AQGAMTGGLVYLIVALIVK G WID +LP 
FKVPAYMGSSFAYIGAMTLLMKNGGMPAIAQGAMTGGLWLIVALIVKFAGKGWIDKVLP 124 



PIWGPIVMVIGLSLA TA+ND M 





3 


Sbjct: 


5 




63 


Sbjct: 


65 




123 


Sbjct: 


125 




179 


Sbjct: 


185 




239 


Sbjct: 


245 




299 


Sbjct: 






359 


Sbjct: 


365 



LTMAPIAFVTMTEHFGHIMVLNSLTK+DYFK+PGLEKTLTGDG AQI IAGF+GAPPVTSY 



GLKIL+E+KVD D K+NLLI+SV+LV GIGG+-H- + LQIS VA +T+LGI+L VLP+ 
GLKILVENKVDFDIKRNLLISSVVLVIGIGGMIINITQNLQISSVAIATILGIVLNLVLPK 425 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 186/425 (43%) , Positives = 282/425 (65%) , Gaps = 17/425 (4%) 

Query: 30 NLLLDIDEKPELFQGLLLSFQHVFAMFGATILVPLILGMPVSVALFASGCGTLIYQVATK 89 

+++ D++E P+ + LSFQH+FAMFGAT+LVP+++G+ SVAL +SG GTL + TK 
Sbjct: 3 DVIYDVEEVPKAGMLVGLSFQHLFAMFGATVLVPILVGIDPSVALLSSGLGTLAHLSVTK 62 

Query: 90 FKyPVYLGSSFAYITAfMAMKQMHGDISAAQTGILFVGLIYVWATVIKFVGNSWVDKI 149 

FK+P Y+GSSFAYI AM L MK I A G + GL+Y++VA ++K +GN W+D I 

Sbjct: 63 FKIPAYMGSSFAYIAAMQLLMKT — NGIGAVAOGftMTGGLVYLIVALIVKAIGNDWIDNI 120 



Query: 150 LPPI I IGPMI 1VIGLGLANSAVTNAGFVAKGDWRK — MLVAWTFLIAAFINTKGKGFIK 207 
LPPI++GP+++VIGL LA++AV + + G++ +++ +VT L FN GKG + 
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Sb D ct: 


121 


LPPIWGPIVMVIGLSmSTAVNDV-MLKNGNYHLTYLVIGLVTLLSVIFraiYGKGIVA 




Query: 


208 


IIPFLFAIIGGYILSIILG LVDLSPVEKAAWFELPKFYLPFKTGLFHSYKLYFG 


261 






I+P L ++ GY++++++G +VD + V +A WF +P +PF T Y + F 




Sb 3 ct: 


180 


1VPLLLGLLVGYWALLVGVLTGQEIVDFTOVAQAKWF. 




Query: 


262 


PE-MLAILPISIVTIAENIGDHTVLGQICGRKFLKKPGjNRLLIGDGLATAFSALIGSPA 


320 






P +L + PI+ VT+ E+ G VL + R++ K PGL + L GDG A + +G P 




Sb 3 ot: 






294 




321 


ETrYGENTGVIGMTRIASVTVIRNAAFIAIAFSFFGKFTALISTIPSAVLGGMAILLYGV 


330 






T+YGEN GV+ + +1 SV VI AA IA SF GK +ALI +IP+ V+GG+++ L+GV 




Sbjct: 


295 


OTSYGEN1GVMALNKIFSVYVIAGAAVIAALLSFIGKVSALIQSIPTPVIGGISVALFGV 




Query: 


381 


IASNGLKVLIENRVNFAEVRNLI IASSMLVLGLGGAVLDLGALTLSGTALSAIVGI ILNL 


440 






IAS+GLK+LIE++V+ +NL+IAS +LV G4GG +L + L +SG A S ++GIIL 




Sbjct: 


355 


IASSGLKILIESKVDMDNKKNLLIASVILVSGIGGtMLQVNGLQISGVAFSTLLGIILYQ 


414 


Query: 




ILPKE 445 








+LP++ 




Sbjct: 


415 


VLPEK 419 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1749 

A DNA sequence (GBSxl856) was identified in S.agalactiae <SEQ ID 5433> which encodes the amino 
acid sequence <SEQ ID 5434>. Analysis of this protein sequence reveals the following: 
Possible site: 20 

»> seems to have no N-terminal signal sequence 

30 

Final Results 

bacterial cytoplasm --- Certainty=0 . 3863 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



A DNA sequence (GBSxl857) was identified in S.agalactiae <SEQ ID 5435> which encodes the amino 
acid sequence <SEQ ID 543 6>. This protein is predicted to be sodium/alanine symporter. Analysis of this 
protein sequence reveals the following: 



Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -8 

INTEGRAL Likelihood = -8 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -6 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -0 



~ - - 207 ( 184 - 

Transmembrane 151 - 167 ( 148 - 17i; 

Transmembrane 217 - 233 ( 216 - 238 

Transmembrane 312 - 328 ( 310 - 333 

Transmembrane 357 - 373 ( 349 - 376 

Transmembrane 424 - 440 ( 422 - 441 

Transmembrane 396 - 412 ( 390 - 417 

Transmembrane 25 - 41 ( 25 - 41 



55 



Final Results 
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bacterial membrane Certainty=0. 5352 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 A related GBS nucleic acid sequence <SEQ ID 9867> which encodes amino acid sequence <SEQ ID 9868> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22541 GB:U32770 amino acid carrier protein, putative 
[Haemophilus influenzae Rd] 
10 Identities = 255/443 (57%) , Positives = 333/443 (74%) , Gaps = 4/443 (0%) 

11 TLFTHINSFVWGPPLLALLVGTGIYLSFRLGFIQLRQLSRAFKLIFREDNG-QGDISSYA 69 

++ + I+SF+WG PLL LL GTG+YL+ RLGFIQ+R L RA +F++D G +GD+SS+A 
5 SILSAIDSFIWGAPLLILLSGTGLYLTLRLGKIQIRYLPRALGYLFKKDKGGKGDVSSFA 64 

70 ALATAIAATVGTGNIVGVATAIKSGGPGALFWMWVAAFFGMATKYAEGLLAIKYRTKDTN 129 

AL TALAAT+GTGNIVGVATA+++GGPGA+F7JMW+ A GMATKYAE LLA+KYR +D N 
65 ALCTAIiAATlGTGNIVGVATAVQAGGPGAIFWMWLVALLGMATKYAECLLAVKYRVRDKN 124 

130 GEISGGPMYYIINGMGQKWKPLAVFFSAAGILVALLGIGTFTQVNAIASSLEHTFKISTR 189 

G ++GGPMYYI G+G +W LA F+ G++VA GIGTF QVNAI +++ TF I 
125 GFMAGGPMYYIERGLGIRW--LAKLFALFGVMVAFFGIGTFPQVNAITHAMQDTFNIPVL 182 

190 FTSLIIAVIVLFIIFGGIKSISKVSEKIVPFTOISYILATLIIIAVNYNKIPHTFQLIFS 249 

T++I + ++V II GG+K 1+ S IVPFMAI Y+ +L+II +N K+P LI 
183 VTAIIVTLLVGLIILGGVKRIATASSVIVPFMAILYVTTSLVIILLNIEKVPDAILLIID 242 

250 GAFSGTAAIGGFSGAIVKEAIQKGIARGVFSl^SGIXSSAPIAARAAKTKEPVEQGLISMT 309 

AF AA+GG G V +AIQ G+ARG+FSNESGLGSAPIAAAAA+T+EPV QGLISMT 
243 SAFDPQAALGGAVGLTVMKAIQSGVARGIFSNESGLGSAPIAAAAAQTREPVRQGLISMT 302 

310 GTFIDTIVICTLTGIAILVTGKWLEFDLQGAPLTQASFNTVFG-SLGSFALTFCLVLFAF 368 

GTF+DTI++CT+TGI +++TG W +L GA +T +F G S+G+ +T L+ FAF 
303 GTFLDTIIVCTMTGIVLVLTGAVIlOTPEIiAGATVTNYAFAQGLGTSIGATI'VTVGLLFFAF 362 

369 TTILGWSYYGERCFEYLFGTKFINAYRIIFVIMVC-LGGFLQLDLIWVIADIVNGLMALPN 428 

TTILGW YYGERCF YL G + + YR+ ++++VGLG FL L+LIW+IADIVNGLMA PN 
363 TTILGWCYYGERCFVYLVGIRGVKLYRLAYIMLVGLGAFLHLNLIWIIADIVNGLMAFPN 422 

429 LIALLALSPI IVKETQKYFSETK 451 
LIAL+ L +I++ET+ YF K 

423 LIALIGLRKVI IEETKDYFQRLK 445 ■ 



Sbjct 
Query 
Sbjct 

Sbjct 
Query: 
Sbjct 
Query: 
Sbjct 
Query: 
Sbji 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5437> which encodes the amino acid 
45 sequence <SEQ ID 5438>. Analysis of this protein sequence reveals the following: 



Possible site: 45 



50 



55 





have an uncleavable N-term signal seq 










INTEGRAL 


Likelihood =-11.36 


Transmembrane 


183 


199 


175 


206 


INTEGRAL 


Likelihood = -7.80 


Transmembrane 


143 




140 


163 


INTEGRAL 


Likelihood = -7.11 


Transmembrane 


209 


225 


208 


229 


INTEGRAL 


Likelihood = -5.95 


Transmembrane 


416 


432 


413 


434 


INTEGRAL 


Likelihood = -5.15 


Transmembrane 


304 


320 


302 


324 


INTEGRAL 


Likelihood = -4.46 


Transmembrane 


387 


403 


382 


408 


INTEGRAL 


Likelihood = -3.35 


Transmembrane 


348 


364 


345 


366 


INTEGRAL 


Likelihood = -1.17 


Transmembrane 


11 


27 


10 


28 



Final Results 

bacterial membrane 
bacterial outside 
60 bacterial cytoplasm 



•-- Certainty=0. 5543 (Affirmative) < suco 
•-- Certainty=0 . 0000 (Not Clear) < suco 
•-- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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>GP:AAF94579 GB:AE004221 sodium/alanine symporter [Vibrio cholerae] 
Identities = 261/441 (59%), Positives = 328/441 (74%), Gaps =7/441 (1%) 

Query: 3 ALVKLIDNLVWGPPLLILLVGTGIYLTSHLGLIQILKLPRAFKIilFSDDEG HGDISS 59 

+ ++ +D+LVWGPPLLILLVGTG+Y T LGL+Q +LP A 4+F ++ GD+SS 
Sbjct: 6 SFLQTVDSLWGPPLLILLVGTGVYFTBT?)^LLQFRRLPTAL?WVFGREKSSDKQGDVSS 65 

Query: 60 FARLATMiAAWGTGNIVGVATAIKSGGPGMiFWMWVARFFG^TKyAEGVIjaiKyRTKD 119 

FAAL TAL+AT+GTGNI VGVATAI K GGPGALFWMW+AA FGMATKYAE +LA+KYR D 
Sbjct: 66 FA^CTALSATIGTGNIVGVATAIKLGGPGABFViMllAALFG^TKYAECLIAViMRQID 125 

Query: 120 ANGHISGGPMYYIVNGMGTKWKPLAVLFAGSGILVALFGIGTFAQVNSITSSLGHSFGLS 179 

G + GGPMYY+ +G+ +K LAVLFA + VA FGIGTF QVN+I + SFG+ 
Sbjct: 126 DKGQMVGGPMYYLRDGVSSK--TLAVLFAVFAVGVACFGIGTFPQVMAILDATQISFGVP 183 

Query: 180 PQMVSIVrAIFVAAIIFGGIHSISKVAEIWVPFMAIFYILSSrAVIFSHYQQLLPVIRLV 239 

+ ++VL + VA + GGI SI+KVA KWP MA+FYI++ L+VI ++ +L + LV 
Sbjct: 184 RFASAVVLTVLVAIVTIGGIQSIAKVAGKWPAMALFYIIACLSVIVTOADKLADAVELV 243 

Query: 240 FQSAFTPTAAIGGFAGSLMKDAIQKGIARGVFSNESGLRSAPIMAAAKTNEPVEQGLIS 299 

SAFT TAA GGF G+ + AIQ GIAKGVFSNESGL SAP+AAAAMCT+ VEQGLIS 
Sbjct: 244 LVSAFTSTAATGGFLGASIMLAIQSGIARGVFSNESGLGSAPMAAAAAKTDSCVEQGLIS 303 

Query: 300 MTGTFIDTIIICTIjTGLSILVTGQWTGQLEGAPLTQSAFATVFG--NLGTFGLTFSLVLF 357 

MTGTF DTIIICT+TGL++++TG W L ffl +T AFAT +G ++ L+ F 

Sbjct: 304 MTGTFFDTIIICTMTGIALILTGAWQSDLSGAAMTTYAFATGLNAQTIGPMLVSIGLMFF 3S3 

Query: 358 AFTTILGWSYYGERCFEFLFGITHLTYFRIVFILMVGLGGFLKLELIWVLADIVNGLMAL 417 

AFTTILGW+YYGERC FLFG + ++IVFI ++ G FL L+LIW++ADIVNGLMA+ 
Sbjct: 364 AFTTILGWNYYGERCMVFLFGTKAVLPYKIVFIGLIASGAFLHLDLIWIIADIVNGLMAI 423 

Query: 418 PNLIALLALSPWILETKHYF 438 

PNLI L+AL W+ ETK YF 
Sbjct: 424 PNL I GLVALRHWVEETKQ Y F 444 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 323/439 (73%), Positives = 380/439 (85%), Gaps = 1/439 (0%) 

Query: 9 MLTLFTHINSFVWGPPLLALLVGTGIYLSFRLGFIQLRQLSRAFKLIFREDNGQGDISSY 58 

M+ L I++ VWGPPLL LLVGTGIYL+ LG IQ+ +L RAFKLIF +D G GDISS+ 
Sbjct: 1 MIALVKLIDNLVWGPPLLILLVGTGIYLTSHLGLIQILKLPRAFKL1FSDDEGHGDISSF 60 

Query: 69 AAIATAIAATVGTGNIVGVATAIKSGGPGALFWI'H'IVAAFFGMATKYAEGLLAIKYRTKDT 128 

AAIATAlAATVGTGNIVGVATAIKSGGPGALFlWIWAAFFGI^TKYAEG+IjAIKYRTiuO 
Sbjct: 61 AALATAIAATVGTGNIVGVATAIKSGGPGALFWI#TVA^FGMATKYAEGVLAIKYRTKDA 120 

Query: 129 NGEISGGPMYYIINGMGQKWKPLAVFFSAAGILVALLGIGTFTQVNAIASSLEHTFKIST 188 

NG ISGGPMYYI-t-NGMG KWKPLAV F+ +GILVAL GIGTF QVN+I SSL H+F +S 
Sbjct: 121 NGHISGGPMYYIVNGMGTKWKPLAVLFAGSGILVALFGIGTFAQVNSITSSLGHSFGLSP 180 

Query: 189 RFTSLILAVIVLFIIFGGIKSISKVSEKIVPFMAI3YIIATLIIIAVNYNKIPHTFQLIF 248 

+ S++IA+ V IIFGGI SISKV+EK+VPFMAI YIL++L +1 +Y ++ +L+F 
Sbjct: 181 QMVSIVIAIFVAA.IIFGGIHSISKVAEKVVPFMAIFYILSSLAVIFSHYQQLLPVIRLVF 240 

Query: 249 SGAFSGTAAIGGFSGAIVKEAIQKGIARGVFSNESGLGSAPIAAA&AKTKEPVEQGLISM 308 

AF4 TAAIGGF+G+++K+AIQKGIARGVFSHESGL SAPIAAA&AKT ERVEQGLISM 
Sbjct: 241 QSAFTPTA&IGGFAGSLMKDAIQKGIARGVFSNESGLRSAPIAAAAAICTNEPVEQGLISM 300 

Query: 309 TGTFIDTIVICTLTGIAILVTGKWLEFDLCGAPLTQASFNTVFGSLGSFALTFCLVLFAF 368 

TGTFIDTI+ICTLTG++ILVTG+W L+GAPLTQ++F TVFG+LG+F LTF LVLFAF 
Sbjct: 301 TGTFIDTIIICTLTGLSILVTGQWTG-QLEGAPLTQSAFATVFGNLGTFGLTFSLVLFAF 359 

Query: 369 TTILGWSYYGERCFEYLFGTKFINAYRIIFVIMVGLGGFLQLDLIWVIADIVNGLMALPN 428 

TTILGWSYYGERCFE+LFG + +RI+F++MVGLGGFL+L+LIWV+ADIVNGLMALPN 
Sbjct: 360 TTILGWSYYGERCFEFLFGITHLTYFRIVFILIWGLGGFLKLELIWrtADIVNGLMALPN 419 



Query: 



429 LIALIiALSPIIVKETQKYF 447 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1751 

A DNA sequence (GBSxl858) was identified in S.agalactiae <SEQ ID 5439> which encodes the amino 
acid sequence <SEQ ID 5440>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.16 Transmembrane 85 - 101 ( 80 - 108) 

INTEGRAL Likelihood = -5.36 Transmembrane 118 - 134 ( 115 - 137) 

INTEGRAL Likelihood = -2.81 Transmembrane 177 - 193 ( 177 - 193) 

INTEGRAL Likelihood = -0.48 Transmembrane 49 - 65 ( 49 - 65) 

Final Results 

bacterial membrane Certainty=0 . 3463 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12451 GB:Z99107 alternate gene name: ydxT-similar to cation 

efflux system membrane protein [Bacillus subtilis] 
, Identities = 118/282 (41%) , Positives = 181/282 (63%) 

Query: 6 ENLQLAKRGPIISIIAYITLAVAKLAAGYWFDATSLVADGFNNLSDILGNVALLIGLHLA 65 

+ L+ + G ++SI AY+ L+ KL GY F + +L ADG NN +DI+ +VA+LIGL ++ 
Sbjct: 5 DELKKGESGALVSIAAYLVLSAIKLI IGYLFHSEALTADGLNNTTDI IASVAVLIGLRI S 64 

Query: 66 SQPADSNHRFGHWKIEDLASLITSFIMEWGIQVFIQTVTKIINNTDTNIDPLGAIVGAI 125 

+P D +H +GH++ E +ASLI SFIM WG+QV I + D + A A 

Sbjct: 65 QKPPDEDHPYGHFRAETIASLIASFIMMWGLQVLFSAGESIFSAKQETPDMIAAWTAAG 124 

Query: 126 SALVMLGVYFYNKQLSQRVKSSALVAASKDNLSDAVTSIGTSIAIIAASLNFPIIDRLAA 185 

A++ML VY YNK+L+++VKS AL+AA+ DN SDA SIGT I I+AA + ID + A 
Sbjct: 125 GAVLMLIVYRYNKI^KKVKSQALLAAAADNKSDAFVSIGTFIGIVAAQFHLAWIDTVTA 184 

Query: 186 IIITYFILKTAYDIFIESAFSLSDGFDDYQLKQYEKAILTIPKISAVKSQRGRTYGSNIY 245 

+1 I KTA+DIF ES+ SL+DGFD + Y++ I I +S +K + R GS ++ 
Sbjct: 185 FVIGLLICKTAWDIFKESSHSLTDGFDIKDISAYKQTIEKISGVSRLKDIKARYLGSTVH 244 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5441> which encodes the amino acid 
sequence <SEQ ID 5442>. Analysis of this protein sequence reveals the following: 



Possible site: 46 
» Seems to have a cleavable N-term signal seq, 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



il - 137 ( 114 - 

• 6 - 102 ( 84 - 

'8 - 194 ( 176 -'197; 

69 Transmembrane 50 - 66 ( 50 - 

64 Transmembrane 158 - 174 ( 158 - 174; 



- Final Results 

bacterial membrane Certainty=0. 4206 (Affirmative) . 

bacterial outside Certainty=O.O000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 {Not Clear) < i 
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The protein has homology with the following sequences in the databases: 

>GP:CAB12451 GB:Z99107 alternate gene name: ydxT-similar to catl 
efflux system membrane protein [Bacillus subtilis] 
Identities = 127/280 (45%) , Positives = 187/280 (66%) 



P D +H +GH++ E ++SL+ SFIM +VG QVL +SIFS +Q D + A 







Sb^Ct. 






69 


Sbjct: 


67 






Sbjct: 


127 




189 


Sbjct: 


187 




249 


Sb j ct : 


247 



I KTA+DIF ESS SL+DGFD + + Y++ I +1 + +K +AR GS V++D 



+V+E++ DL++ ESH I ++E+ 4 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 274/406 (67%) , Positives = 340/406 (83%) , Gaps = 4/40S (0%) 

Query: 7 ISnLQLAKRGPIISIIAYITLAVAKL&AGYWFDATSLVADGFNI^ 66 

NL4-LA+-M3PI+SII Y++L+VAKL AGY +A+SL+ADGFNNLSDI i GNVALLIGLHLAS 
Sbjct: 8 KLKIARKGPIVSIIVYLSLSVAKLIAGYLIJ^SLIADGFNNLSDIVGWALLIGIjHLAS 67 

Query: 67 QPADSNHRFGHWKIEDLASLITSFIMFWGIQVFIQTVTKIINNTDTNIDPLGAIVGAIS 126 

QPAD+NH+FGHWKIEDL+SL+TSFIMF+VG QV I T+ I + +IDPLGAIVG +S 
Sbjct: 68 QPADANHKFGHWKIEDLSSLVTSFIMFLVGFQVLIHTIKSIFSGQQVDIDPLGAIVGIVS 127 

Query: 127 ALVMLGVYFYNKQLSQRVKSSALVAASKDNLSDAVTSIGTSIAIIAASLNFPIIDRLAAI 186 

A VMLGVY +NK+LS+RVKSSALVAASKDNL+DAVTSIGTSIAIIA&SL+ P+ID +AA+ 
Sbjct: 128 AFimLGVWFNKRLSKKVKSSALVAASKDNLADAVTSIGTSIAIIAASLHLPVIDHIAAM 187 

Query: 187 1ITYFILKTAYDIFIESAFSLSDGFDDYQLKQYEKA1LTIPKISAVKSQRGRTYGSNIYL 246 

IIT+FILKTA+DIF+ES+FSLSDGFD LK+YEKAIL IPKI AVKSQR RTYGSN+YL 
Sbjct: 188 IITFFILKTAFDIFMESSFSLSDGFDSRHLKKYEKAILEIPKIVAVKSQRARTYGSNVYL 247 

Query: 247 DIVLEMNPDLSVFESHAITERVEKLLSDKFSVYDIDIHVEPASIPEDEIFDNVYQKLYKN 306 

DIVLEMNPDLSV+ESH+ITE+VE+LLSD+FS+YDIDIHVEPA IPE+EIFDNV +KLY+ 
Sbjct: 248 DIVLEMNEDLSVYESHSITEK\'EQLLSDQFSIYDIDIIprePAMIPEEEIFDNVAKKLYRY 307 

Query: 307 EKIIIAKIPGYETFISPDFYMINEKGNIITSDMLTNATNHSLASNFKYFNVKSISQKTKL 366 

EK+IL+K+P Y+ +1+ F +1+ G + , + W + SNF +F ++SISQKT L 
Sbjct: 308 EKLILSKVPDYDHYIAKSFQLIDANGQTVNYEQFLNQEIY-YPSNFHHFQIESISQKTML 366 

Query: 367 VSYELEGKRHTSIWRRNEKWFLIYHQIT- -AKSSPYKTRRYQITSL 410 

V+Y+L G + TSIWRR+E W L++HQIT AK + T Y+I + 
Sbjct: 367 VTYQLNGNQRTSIWRRHESWSLLFHQITPIAKKQLHHT-HYRIVKM 411 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1752 

A DNA sequence (GBSxl859) was identified in S.agalactiae <SEQ ID 5443> which encodes the amino 
acid sequence <SEQ ID 5444>. Analysis of this protein sequence reveals the following: 
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Possible aite: 55 
>>> Seems to have 
INTEGRAL 



INTEGRAL 
INTEGRAL 
INTEGRAL 



terminal 
Likelihood = -I 
Likelihood = - 
Likelihood = 
Likelihood = -! 
Likelihood = -: 
Likelihood = 



lignal sequence 



32 



Transmembrane 
Transmembrane 
Transmembrane 



Transmembrane 31 - 



■ Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



187 ( 161 - 

134 ( 113 - 

75 ( 53 - 

247 ( 226 - 

102 ( 84 - 

47 ( 31 - 



- Certainty=0. 4248 (Affirmative) . 

- Certainty=0 . 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



15 A related GBS nucleic acid sequence <SEQ ID 9869> which encodes amino acid sequence <SEQ ID 9870> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14850 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 80/226 (35%), Positives = 136/226 (59%), Gaps = 1/226 (0%) 

20 

Query: 27 TNNPI FGIMLTVWAYYIGIRI FRKYPSPAT-TPLLLATILLIAFLKLTHI SYKDYYNGGS 85 

T +P FGI+++4- A+ IG +F+K TPL +A +L IAFLK+ SY DY NGG 

Sbjct: 4 TMSPYFGIW3LAAFGIGTFLFKKTKGFFLFTPLFVAMVLGIAFLKIGGFSYADYNNGGE 63 

25 Query: 86 FLTMLITPSTVVLAIPLYRTFHLMKHHIKSISISIIIASVINTVFTAIVAKFFGMKYFLA 145 

+ + P+T+ AIPLY+ +K + I SII S+ + ++AK + + 

Sbjct: 54 IIKFFLEPATIAFAIPLYKQRDKLKKYWWQIMASIIAGSICSVTIVYLLAKGIHLDSAVM 123 

Query: 146 ISLFPKSVTTAMAVGITSKAGGLATITLVWVITGILTSVLGPIFLKLLRIEDPVAIGLA 205 
30 S+ P++ TTA+A+ ++ GG++ IT V+ ++ LG +FLK+ ++++P++ GLA 

Sbjct: 124 KSMLPQAATTAI ALPLSKGIGGI SDITAFAVI FNAVIVYALGALFLKVFKVKNP I SKGLA 183 

Query: 206 LGGTGHAIGTGQALKYGQVQGAMAGLAIGITGICYVIVSPLVAGLI 251 
LG +GHA+G ++ G+V+ AMA +A+ + G+ V+V P+ LI 
35 Sbjct: 184 LGTSGHALGVAVGIEMGEVEAAMASIAWWGWTVLVIPVFVQLI 229 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8893> and protein <SEQ ID 8894> were also identified. Analysis of this 
protein sequence reveals the following: 

40 Lipop: Possible site: -1 Crend: 0 

SRCFLG: 0 

McG: Length of IE: 22 

Peak Value of UR: 2.57 
Net Charge of CR: 0 
45 McG: Discrim Score: 6.51 

. GvH : Signal Score (-7.5): -5.91 
Possible site: 33 



>>> Seems to have an uncleavable N-term signal se 


g 








Amino Acid Composition: calculated from 1 










ALOM program count: 6 value: -8.12 threshold: 


CO 








INTEGRAL Likelihood = -8.12 Transmembrane 


149 


165 


( 139 


172) 


INTEGRAL Likelihood = -6.32 Transmembrane 


96 


112 


( 91 


116) 


INTEGRAL Likelihood = -5.89 Transmembrane 


37 


53 


( 31 


55) 


INTEGRAL Likelihood = -5.52 Transmembrane 


209 


225 


( 204 


230) 


INTEGRAL Likelihood = -3.24 Transmembrane 


64 


80 


( 62 


81) 


INTEGRAL Likelihood = -0.32 Transmembrane 


9 


25 


( 9 


25) 


PERIPHERAL Likelihood = 1.06 . 121 











modified ALOM score: 2.12 
icml HYPID: 7 CFP: 0.425 

60 

*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0. 4248 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the databases: 

ORF01066(325 - 999 of 1305) 

EGAD | 107753 |BS2884 (4 - 229 of 231) hypothetical protein {Bacillus subtilis} OMNI | NT01BS3363 
LrgB GP| 1770004 1 emb | CAA99613 . 1 1 |Z75208 hypothetical protein {Bacillus subtilis} 
10 GP|2635355|emb|CAB14850.l| |Z99118 similar to hypothetical proteins {Bacillus subtilis} 

PIR|D69983|D69983 conserved hypothetical protein ysbB - Bacillus subtilis 
%Match =17.2 

%Identity = 35.4 %Similarity = 62.4 

Matches =80 Mismatches = 84 Conservative Sub.s = 61 

15 

192 222 252 282 312 342 372 402 

WSTFKT*SPIFLG*LSLS*ERYFSIF*LLD^ryPNGSKRDMKEIIQKLEvTCMATLTNNPIFGIMLTVWAYYIGIRIFRKYP 

I :| 111=::: 1= II :|:| ' 
MESTMSPYFGIWSLAAFGIGTFLFKKTK 
20 10 20 



429 459 489 519 549 579 609 639 

SPAT-TPLLIATILLIAFLKLTHISYKDYYNGGSFLTMLITPSTWIAIPIjYRTFHLMKHHIKSISISIIIASVINTVFT 

IIImI =1 llllh II II III = 1 = 1= =11111= : I = I III 1= = 

25 GFFLFTPLFVAMVTjGIAFLKIGGFSYAD'yNNGGEIIKFFLEPATIAFAIPLYKQRDKLKKYWWQIMASIIAGSICSvTIV 
40 50 60 70 80 90 100 

669 699 729 759 789 819 849 ' 879 

AIVAKFFGMKYFLAISLFPKSVTT7AMAVGITSKAGG1ATITLVVWITGILTSVTjGPIFLI^LRIEDPVAIGIiALGGTGH 
30 ::|| : : |::|:: |||:|: =: ||:: || |: :: || : | | | = ::::: | : : ||||| :|| 

YLIAKGIHLDSAVMKSMLPQAATTAIALPLSKGIGGISDITAFAVIFNAVIWALGALFLIWFKVKNPISKGLALGTSGH 
120 130 140 . 150 160 170 180 

909 939 969 999 1029 1059 1089 1119 

35 AIGTGQALKYGQVQGAMAGLAIGITGICYVIVSPLVAGLILK*G*QK*TQNNYVIIFKNRI*DK*L*YR*KK*LLERLSV 
1 = 1 III =1= = |« 1 = ] h 11 

ALGVAVGIEMGEVFAAMASIAVVWGVVrVLVIPVFVQLIGG 
200 210 220 230 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1753 

A DNA sequence (GBSxl860) was identified in S.agalactiae <SEQ ID 5445> which encodes the amino 
acid sequence <SEQ ID 5446>. Analysis of this protein sequence reveals the following: 

45 Possible site: 28 

»> May be a lipoprotein 



Final Results 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside --- Certainty=0. 0000 (Not Clear) <. suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 21 TACSSSNTQQTSTSKSNVSQHKNIKADHEEIiRLKI!NKVKLGvlCANNFKGGTSLAELKQLF 80 

T S ++T++ S+ K + + K D+ ++K+ +G N+ +GG++ E+K + 

Sbjct: 60 TNSSKNDTKKESSEKKSEDKSK KNSDIjKATYDKINVGDIMNSSEGGSTEDEVKAIL 115 



.Query: 81 GGEPNEKFDTPAGNVTLKGYRW-NVDD— 



■--ISITIQIjLNDSSIVRSISNFKFIRDANIT 135 
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Query: 
Sbjct 
Query: 
Sbjct 



136 TKDTOSLKNGMSTO--KVKELLG3PDDISQAVSSDKEELQAAWISGIQSSDSDPGINLTP 193 
N++ SY+ + ++ LG+P 1+ + ++ W+ + D + ++F 

- DGDLGAT VTVS F 233 



234 SNGNAI SKSSSGLK 247 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5447> which encodes t 
sequence <SEQ ID 5448>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA76857 GB:Y17797 hypothetical protein [Enterococcus faecalis] 
Identities = 34/166 (20%), Positives = 74/166 (44%), Gaps = 8/166 (4%) 

Query: 47 HQDKRANFEKIKIATVDSSFTGGTSLEELISLFGEPSQHDPKTAGEVTIDAYTWQFDQ — 104 

+ D +A ++KI + + +S GG++ +E+ ++ GEP+ ++ +W + 

Sbjct: 83 NSDLKATYDKINVGDIMNSSEGGSTEDEVKAILGEPASSSTTDIQ/3ISTTTLSWTNVKGG 142 

Query: 105 - 

Sbjct: : 

Query: 160 SQASSSDHQTLQAIWVSGLKTDTSGANISLVFENNQLTEMSQVGLE 205 

+ + + + IW+ L D GA +++ F N S GL+ 

Sbjct: 203 TSTNINGEKNDTLIWMKNLDGDL-GATVTVSFSNGNAISKSSSGLK 247 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 84/199 (42%) , Positives = 126/199 (63%) , Gaps = 3/199 (1%) 

Query: 11 TIVCLSFLG--LTACSSSNTQQTSTSKSNVSQHKNIKADHEELRLKFNKVKLGVKANNFK 68 

T++ +SF L ACS++ ++ S S + + +A H++ R F K+KL ++F 
Sbjct: 8 TLLLISFFTSFLVACSTTKDKEPQPSDSEIITPRLHQAAHQDKRANFEKIKLATVDSSFT 67 

Query: 69 GGTSIAELKQLFGGEPNEKFDTPAGNVTLKGYRWNVDD I S I TI QLLNDSS I VRS I SNFKF 128 

GGTSL EL LFG EP++ AG VT+ Y W D +++T+ L +SSIV++ISNF F 

Sbjct: 68 GGTSLEEIiISLFG-EPSQHDPKTAGETOIDAYTWQFDQVTLTVWLYQNSSIVKTISNFTF 126 

Query: 129 IRDANITTKDYNSLKNGMSYNICVKELLGEPDDISQAVSSDKEELQAAWISGIQSSDSDPG 188 

R+ ++ K+Y L+ GMSY VK++L EPD+ SQA SSD + LQA W+SG+++ S 
Sbjct: 127 ARELGLSQKEYQQLQKGMSYEDVKKILTEPDNYSQASSSDHQTLQAIWVSGLKTDTSGAN 186 

Query: 189 INLTFENDKLTNKQQHGLK 207 

I+L FEN++LT Q GL+ 
Sbjct: 187 ISLVFENNQLTEMSQVGLE 205 

SEQ ID 5446 (GBS650) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 178 (lane 9; MW 28kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1754 

A DNA sequence (GBSxl861) was identified in S.agalactiae <SEQ ID 5449> which encodes the amino 
acid sequence <SEQ ID 5450>. This protein is predicted to be ribosomal protein SI homolog; Sequence 
specific DNA-binding protein (r. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2950 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9363> which encodes amino acid sequence <SEQ ID 9364> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 ME^ICAWDICLVGREGEVVTVKGTRAVKGGLSVEFEGLRGFIPASMIDTRFVRNTEKFVGQ 60 

++ARKAW+ L EG+ V K AV+GGL V+ G+RGF+PASM+ RFV + +F + 

Sbjct: 53 LDARKAWENLSFAEGDTVDAKVINAWGGLIVDVNGVRGFVPASMVAERFVSDLNQFKNK 112 

Query: 61 EFDAKIKEVBAAENRFILSRREVVEESAAAARKEVFSNIEVGSVVTGKVARLTSFGAFID 120 

+ A++ E+D A R ILSR+ V + AA EVPS + VG W G VARLT FGAF+D 

Sbjct: 113 DIKAQVIEIDPANARLILSRKAVAAQERAAQLAEVFSKLSVGEVVEGTVARLTDFGAFVD 172 

Query: 121 IX3GVDGLVHOTELSHERIWSPKSWTVGEEVEVKVLSIDEEAGRVSLSLKATTPGPWDGV 180 

LGGVDGLVHV+E+SH+R +P V+T G++V+VK+L++D E GR+SLS+KAT GPWD 

Sbjct: 173 I^GVDGLVHVSEISHDRVKNPADVLTKGDKvDVKIIALDTEKGRISLSIKATQRGPWDEA 232 

Query: 181 EQKIiAAGDVIEGKVKRLTDFGAFVEVLPGIDoLVHISQISHKRVENPKDVLSAGQEVTVK 240 

++AAG V+EG VKR+ DFGAFVE+LPGI+GLVH+SQIS+KR+ENP +VL 4G +V VK 

Sbjct: 233 ADQIAAGSVLEGTVKRVKDFGAFVEILPGIEGLVHVSQISNKRIENPSEVLKSGDKVQVK 292 



Query: 241 \ 

VL++ ER+SLSMKALEE+P + E R+ R + Y+ + + ++ 

Sbjct: 293 VLD I KPAEERI SLSMKALEEKP EREDRRGNDGSASRADIAAYK-QQDDSAATLG 345 

Query: 301 DLFGD 305 

Sbjct: 346 DIFGD 350 

A related DNA sequence was identified in S.pyogenes <SEQ ID 545 1> which encodes the amino acid 
sequence <SEQ ID 5452>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal i 



Final Results 

bacterial cytoplasm Certainty=0. 3312 (Affirmative) < suco 

bacterial membrane Certaxnty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 284/309 (91%), Positives = 296/309 (94%), Gaps = 1/309 (0%) 

Query: 1 MEARKAVTOKLVGREGEVVTVKGTRAVKGGLSVEFEGLRGFIPASMIDTRFVRNTEKFVGQ €0 

+EARKATOKLVGREGEVVTVKGTRAVKGGLSVEFEGLRGFIPASMIDTRFVRNTEKFVGQ 
Sbjct: 93 LEARKAWDKLVGREGEvWVKGTRAVKGGLSVEFEGLRGFIPASMIDTRFVRNTEKFVGQ 152 
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Query: 


61 


EFDAKIKEVDAAEI^FILSRREWEESAAAARKEVFSNIEVGSWTGKVARLTSFGAFID 


120 






EFDAKIKEVDAAENRFILSRREV+EE+A AR EVFS I G+WTG VARLTSFGAFID 




Sbj ct : 


153 


EFDAKIKEVDAAEmFILSRREVIEEAAKEARAFVFSKISEGAVOTGTVARLTSFGAFID 


212 


Query: 


121 


LGGVDGLVHVTELSHERlWSPKS\ 7 V T r 1 /GEEVEVKVLSIDEEAGRVSLSLKATTPGPWDGV 


180 






LGGVDGLVHVTELSHERNVSPKSW+VGEEVEVKVLSIDEEAGRVSLSLKATTPGPWDGV 




Sbj ct : 


213 


LGGVDGLVHVTELSHERNVSPKSWSVGE3VEVKVLSIDEEAGRVSLSLKATTPGPWDGV 


272 


Query: 


181 


EQKLAAGDVIEGKVKRLTDFGAFVEVLPGIDGLVHISQISHKRVENPKDVLSAGQEVTVK 


240 






EQKLA GDV+EGKVKRLTDFGAFVEVLPGIDGLVHISQISHKRVENPKDVLS GQEVTVK 














241 


VLEVNSDAERVSLSMKALEERPAQAEGE - KEEKRQSRPRRPRRQEKRDYELPETQTGFSM 


299 






VLEVN+ ERVSLS+KALEERPAQAEG+ KEEKRQSRPRRP+R+ +RDYELPETQTGFSM 




Sbjct: 


333 


VLEVNAADERVSLSIKALEERPAQAEGDNKEEKRQSRPRRPKRESRRDYELPETQTGFSM 


3 92 




300 


ADLFGDIEL 308 








ADLFGDIEL 




Sbjct: 


393 


ADLFGDIEL 401 





20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1755 

A DNA sequence (GBSxl862) was identified in S.agalactiae <SEQ ID 5453> which encodes the amino 
25 acid sequence <SEQ ID 5454>. This protein is predicted to be dihydroorotate dehydrogenase a (pyrD). 
Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm --- Certainty=0 . 1708 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) ■= suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB51330 GB:AJ131985 dihydroorotate dehydrogenase [Streptococcus pneumoniae] 
Identities = 227/310 (73%) , Positives = 268/310 (86%) 



Query: 1 MVSLKTEIAGFSFDNCLMNAAGIYCMTKEELLAIENSEAGSFVTKTGTLEAREGNPQPRY 60 

MVS KT+IAGF FDNCLMNAAG+ CMT EEL ++NS AG+FVTKT TL+ R+GNP+PRY 
Sbjct: 1 MVSTKTQIAGFEFDNCLMNAAGVACMTIEELEEVKNSAAGTFVTKTATLDFRQGNPEPRY 6 0 

Query: 61 ADTDWGS INSMGLPNKGIDYYLDFVTELQDQDNSKNHVLSIjVGLSPEETHI ILKKVENSS 120 

D GSINSMGLPN G+DYYLD++ +LQ++++++ LSLVG+SPEETH ILKKV+ S 
Sbjct: 61 QDVPLGS INSMGLPNNGLDYYLDYLLDLQE KESNRT F FL SLVGMS PEETHTILKKVQESD 120 

Query: 121 YNGLIELNLSCPNVPGKPQIAYDFEMTDLILSEIFSYYQKPLGIKLPPYFDIVHFDQAAT 180 

+ GL ELNLSCPNVPGKPQIAYDFE TD IL+E+F+Y+ KPLGI KLPPYFDIV+ FDQAA 
Sbjct: 121 FRGLTELNLSCPNVPGKPQIAYDFETTDRILAEVFAYFTKPLGIKLPPYFDIVYFDQAA& 180 

Query: 181 IFNKYPIAFINCVNSIGNGLVIDD3TWIKPKNGFGGIGGDFIKPTALANVHAFYKRLNP 240 

IFNKYPL F+NCVNS IGNGL I+DE+WI+ PKNG FGG I GG++I KPTALANVHAFY+RLNP 
Sbjct: 181 IFNKYPLKBVNCTNSIGNGLYIEDESWIRPKDIGFGGIGGEYIKPTALANVHAFYQRLNP 240 

Query: 241 SIKIIGTGGVKNGRDAFEHILCGASMVQIGTALQKEGPEIFQRVSRELKEIMADKGYQSL 300 

I+IIGTGGV GRDAFEHILCGASMVQ+GT h KEG F R++ ELK IM +KGY+SL 
Sbjct: 241 QIQIIGTGGVLTGRDAFEHILCGAS^1VQVGTTLHKEGVSAFDRITNELKAIMVEKGYESL 300 

Query: 301 EDFRGQLNYL 310 

EDFRG+L Y+ 
Sbjct: 301 EDFRGKLRYI 310 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5455> which encodes the amino acid 
sequence <SEQ ID 5456>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2689 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 239/309 (77%) , Positives = 252/309 (84%) 

Query: 1 IWSLKTEIAGFSFDNCLMNAAGIYCMTKEELLAIENSEAGSFVTKTGTLEARKGNPQPRY 60 

MVS T4-I FSFDNCLMNAAG+YCMTKEEL+ +E S+A SFVTKTGTLE R GNP+PRY 
Sbjct: 5 MVSTATQIGHFSFDNCLMNAAGVYCMTKEELMEVEKSQAASFVTKTGTLEVRPGNPEPRY 64 

Query: 61 ADTDWGSINSMGLPNKGIDYYLDFVTELQDQDNSKNHVLSLVGLSPEETHIILKKVENSS 120 

ADT GSINSMGLPN G YYLDFV++L K H LS+VGLSP ET ILK + S 

Sbjct: 65 ADTRLGSINSMGLPNNGFRYYLDFVSDLAKTGQHKPHFLSWGLSPTETETILKA.IMASD 124 

Query: 121 YNGLIELNLSCPNVPGKPQIAYDFEMTDLILSEIFSYYQKPLGIKLPPYFDIVHFDQAAT 180 

Y GL+ELNLSCPNVPGKPQIAYDFE TD +L IF+YY KPLGIKLPPYFDIVHFDQAA 
Sbjct: 125 YEGLVELNLSCPNVPGKPQIAYDFETTDQLLENIFTYYTKPLGIKLPPYFDIVHFDQAAA 184 

Query: 181 IFNKYPIAFINCVNSIGNGLVIDDETWIKPKNGFGGIGGDFIKPTALANVHAFYKRLNP 240 

IFNKYPL+F4NCVNSIGNGLVI DE V+1KPKNGFGGIGGD+IKPXALANVHAFYKRL P 
Sbjct: 185 IFNKYPLSFUNCVNSIGNGLVIKDEQVLIKPKNGFGSIGGDYIKPTALANVHAFYKRLKP 244 

Query: 241 SIKIIGTGGVKNGRDAFEHILCGASMVQIGTALQKEGPEIFQRVSRELKEIMADKGYQSL 3 00 

SI IIGTGGVK GRDAFEHILCGASMVQIGTAL +EGP IF+RV++ELK IM +KGYQSL 
Sbjct: 245 SIHIIGTGGVKTGRDAFEHILCGASMVQIGTALHQEGPAIFERVTKELKTIMVEKGYQSL 3 04 

Query: 301 EDFRGQLNY 309 

+DFRG L Y 
Sbjct: 305 DDFRGNLRY 313 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1756 

A DNA sequence (GBSxl863) was identified in S.agalactiae <SEQ ID 5457> which encodes the amino 
acid sequence <SEQ ID 545 8>. This protein is predicted to be beta-lactam resistance factor. Analysis of this 
protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4437 (Affirmative) < suco 

bacterial membrane Certainty=0.0000(Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAB89121 GB:AJ277485 beta-lactam resistance factor 
[Streptococcus pneumoniae] 
Identities = 238/410 (58%) , Positives = 304/410 (74%) 

Query: 1 ^^KELTAKEFESYSGNYDLQSFMQTPEMAKLI J KKRGYDITY^IGYQIDGKMEIISIVYTI 60 
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Query: 61 PMTGGLHMEWSGPAHSNSKYLKHFYKELQNYAKSQGALELLIKPYDTYQEFTGEGKPKG 120 

PM GGLHME+NSGP ++ Ii FY EL+ YAK G LELL+KPY+TYQ F +G P 
Sbjct: 61 PMLGGLHMELNSGPIYTQQDALPVFYAELKEYAKQNGVLELLVKPYETYQTFDSQGNPID 120 

Query: 121 APNTYLIDDLTSIGYEfflDGLHIGYPGGEPDWHYVKNLEGITPQNLLKSFSKKGRPLVKKA 180 

A +1 DLT +GY DGL GYPGG3PDW Y K-L, +T ++LLKSFSKKG+PLVKKA 
Sbjct: 121 AEKKS I IQDLTDLGYQFDGLTTGYPGGEPDWLYYKDLTEIiTEKSLLKSFSKKGKPLVKKA 180 

Query: 181 MSFGIKIRVLKREELHIFKDITSSTSDRRDYKDKSLDYYQDFYDSFGDKAEFVIATLNFR 240 

+FGI+++ LKREEL IFK+IT TS+RR+Y DKSL+YY+ FYD+FG+4AEF+IA+LNF 
Sbjct: 181 ETFGIRLKKLKREELSIFKNITKETSERREYSDKSLEYYEHFYDTFGEQAEFLIASLNFS 240 

Query: 241 EYDHNLQI^AKKLEEQITVLDNRHQNNTDSAKYHRQRTELVNQLASLDKRRKEVEPFIQK 300 

+Y LQ KLEE + L NSK QE+Q+ + R+E I+K 

Sbjct: 241 DYMSKLQGEQSKLEEMLDKLRLDLSKNPHSEKKQNQLRHYSSQFETFEWKAEARDLIEK 300 

Query: 301 FGNQDWIAGSLFIYSPKETVYLFSGSYTEFNKFYAPAVLQEYVMQEALKRQSTFYNFLG 360 

+G +D+VLAGSLF+Y P+ET YLFSGSYTEFNKFYAPA+LQ+YVM E++KR YNFLG 
Sbjct: 301 YGEEDIVLAGSLFVYMPQETTYLFSGSYTEFNKFYAPALLQKYVMLES1KRGIPKYNFLG 360 

Query: 361 IQGNFDGSDGVLRFKQNFNGYIVRKMGTFRYYPNPLKYKSIQLLKKILRR 410 

IQG FDGSDGVLRFKQNFNGYIVRK GTFRY+P+PLKYK+IQLLKKI+ R 
Sbjct: 361 IQGIFDGSDGVLRFKQNFNGYIVRKAGTFRYHPSPLKYKAIQLLKKIVGR 410 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5459> which encodes the z 
sequence <SEQ ID 5460>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2652 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 216/410 (52%) , Positives = 291/410 (70%) 
Sbjct: 

Query: 61 PMTGGLHMEVNSGPAHSNSKYLKHFYKELQNYAKSQGALELLIKPYDTYQEFTGEGKPKG 120 

+ GG ME+N+GP ++ + L+HFY +L++YAK + +EL++KPYD YQ F +G P 
Sbjct: 61 KVAGGWRMEIjNAGPNTNHPEELEHFYTQLKDYAKQKDVIELILKPYDNYQSFDTDGIPIS 120 

Query: 121 APNTYLIDDLTSIGYHHDGLHIGYPGGEPDWHYVKNLEGITPQNLLKSFSKKGRPLVKKA 180 

PNT LI LT++GY HDGL GYP GEP WHYVK LEGI L +SFSKKG+ L+KKA 
Sbjct: 121 RPNTDLISLLTAD3Y-1CHDGLKTGYPEGEPWHYVKKLEGIDSSRLTRSFSKKGKALIKKA 180 

Query: 181 MSFGIKIRVLKREELHIFKDITSSTSDRRDYMDKSLDYYQDFYDSFGDKAEFVIATLNFR 240 

+FGIK+R LKR+ELH FK+IT +TSDRRDY+DKSL YYQDFYDSFGD EF++ATLNF 
Sbjct: 181 NTFGIKLtRQLKRDELHHFKEITEATSDRRDYLDKSLSYYQDFYDSFGDSCEFMVATLNFE 240 

Query: 241 EYDHNLQLNAKKLEEQITVLDNRHQNNTDSAKYHRQRTELWQLASLDKRRKEVEPFIQK 300 

+Y +NL+ +L 1+ NSK +EL+Q+ RE F+++ 

Sbjct: 241 DYLNNLKQRQLQIATSINKVKGDLGKNPKSEKKQWLKELSSQFETFQVRISEALHFLEE 300 

Query: 301 FGNQDVVmGSLFIYSPKETVYLFSGSYTEFNKFYAPAVLQEYVMQFJUiKRQSTFYNFLG 360 

+G +DV LAGSLFIY4 +E VYLFSGSY +FNKFY+PA+LQE+ M +A+ + YNFLG 
Sbjct: 301 YGTKDVFIAGSLFIYTEQEAVYLFSGSYPKFNKFYSPALLQEHAMLKAIHKGIKQYNFLG 360 

Query: 361 IQGNFDGSDGVLRFKQNFNGYIVRKMGTFRYYPNPLKYKSIQLLKKILRR 410 
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I G FDGSDGVLRFKQNFNG+I++K GTFR XP P+KY I+L KK+L R 
Sbjct: 361 ITGKFDGSDGVLRFKQNFNGFILQKPGTFRCYPFPIKYHFIRLAKKLLNR 410 

A related GBS gene <SEQ ID 8895> and protein <SEQ ID 8896> were also identified. Analysis of this 
protein sequence reveals the following: 

Homology to resistance proteins 

The protein has homology with the following sequences in the databases: 

57.4/74.9% over 409aa 

Streptococcus 



ORF01118(301 - 1530 of 1833) 

GP|7649683 | emb | CAB89121 . 1 | |AJ277485 (1 - 410 of 410) beta-lactam resistance factor 
{Streptococcus pneumoniae} 
%Match =39.0 

%Identity =57.3 %Similarity =74.9 
' Matches = 235 Mismatches = 103 Conservative Sub.s = 72 

240 270 300 330 360 390 420 450 

I P VM^LYKASNYVYALRKKKNS * LGKOT^ 

III II s||s:|| :|in= :| I I : I I I I 1 = : : I : 

MfllTTLTKEEFQTySDQVSSRSFMQSVQMGDLLEKRGARIVYLALKQEGE 



TSIGYHHDGLHIGYPGGEPDWHYVKNLEGITPQNLLKSFSKKC-RPLVKKAMSFGIKIRVLKREELHIFKDITSSTSDRRD 
I =11= III lllllllll I hi =1 ==lllllllll=llllll =111=== llllll 111=11 11=11= 
TDLGYQFDGLTTGYPGGEPDWLYYKDLTELTEKSLLKSFSKKGKPLVKKAETFGIRLKKLKREELSIFKNITKETSERRE 
140 150 160 170 180 190 200 210 

960 990 1020 1050 1080 1110 1140 1170 

YMDKSLDYYQDFYDSFGDKAEFVIATMFREYDH^ 

I 1111 = 11= lll = l|::|||:|| = lll =1 II Mil =1 I II = I I =1= = = = I 

YSDKSLEYYEHFYDTFGEQAEFLIASmFSDYMSKLQGEQSKLEENLDKLRLDLSKNPHSEKKQNQLREYSSQFETFEVR 
220 230 240 250 250 270 280 290 

1200 1230 1260 1290 1320 1350 1380 1410 

RKEVEPFIQKFGNQDVVIAGSLFIYSPKETvYLFSGSYTEFK-KFYAPAVLQEYvMQEALKRQSTFYXFLGIQGNFDGS G 

= i =1 = 1 = 1 -•i = iniiii=! i-.ii iii 1 1 ii i mini ii-.i i = m i==ii i mm iiii = i 

KAEARDLIEKTGEEEIVIAGSLFVYMPQETTYLFSGSCT^ 

300 310 320 330 340 350 360 370 

1440 1470 1500 1530 1560 1590 1620 1650 

VLXFKQNFNGYIWKMGTFRYYPNPLKYKSlQLLKKILRRT*KISLHKLIFYAL*KASFISLLbLFIQTIMFVI*RNFIT 

ii iiiiiiiiiiii iim = i=iiiii = iumi= i 

VLRFKQNFNGYIVRKAGTFRYHPSPLKYKAIQLLKKIVGR 
380 390 400 410 

SEQ ID 8896 (GBS 198) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 6; MW 48.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 6; MW 73.8kDa). 

GBS198-GST was purified as shown in Figure 223, lane 4. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1757 

A DNA sequence (GBSxl864) was identified in S.agalactiae <SEQ ID 5461> which encodes the amino 
acid sequence <SEQ ID 5462>. This protein is predicted to be MurM protein. Analysis of this protein 
sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0 .4418 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB89539 GB:AJ2E0767 MurM protein [Streptococcus pneumoniae] 
Identities = 204/410 (49%) , Positives = 286/410 (69%) , Gaps = 17/410 (4%) 

MYRE- - - ITAVEHDRFVSESNQTNLLQSSNWPKVKDNWGSQLLGFFDGETQIASASILIK 57 
MYR I +E+D+FV E N+LQSS W KVK +W + LG ++GE +A AS4-LIK 
MYRYQIGIPTLEYDQFVKEHELANVLQSSAWEKVKSDWNHERLGVYEGENLl^VASVLIK 60 

SLPLGFSMLYIPRGPIMDYSNLDIVTKULKDLKAFGKKQRALFIKCDPLIYLK- -MifflAK 115 
SLPLG+ M YIPRGPI+DY + +++ VL+ +K++ + +RA+F+ DP I L +VN 



AIRT++NKG++IQ+G ELL+ F+ELMKKTE RK I+LR YY+KLLD 4 



+LDV+KRL ++E+ Q+A+++ ++ E R KV+ + +RL +EIDFL H 



IPLAATL+LEFG TS I> 



EN L+GGLYHFK KF P IEE++GEF + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5463> which encodes the amino acid 
sequence <SEQ ID 5464>. Analysis of this protein sequence reveals the following: 



Query: 


1 


Sbjct: 


1 


Query: 


58 . 


Sbjct: 


61 




116 


Sbj ct : 


119 




176 


Sbjct: 


177 


Query: 


236 


Sbjct: 


237 


Query: 


294 


Sbjct: 


294 




354 


Sbjct: 


354 



I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2239 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 203/399 (50%) , Positives = 274/399 (67%) , Gaps = 4/399 (1%) 
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Query: 5 ITAVEHDRFVSBSNQTNLLQSSNWPKVKDNWGSQLLGFFDGBTQIASASILIKSLPMFS 64 

1+ EHD+FV Q LLQSS W KVXDNW + + F++ Q+A+A+ LI+ LPLGF+ 
Sbjct: 13 ISPEEHDQFVIAQPQAGLLQSSKMGKVKDNWKHERISFYENGVQVARARCLIRKLPU3FT 72 

Query: 65 MLYIPRGPIMDYSNLDIVTKVLKDLKAFGKKQRALFIKCDPLIYLKMVNAKDFENSPDEK 124 

M+YIPRGPIMDY+N +++ V+K LK FGK +RALFIK DP + +K + + S + 

Sbjct: 73 MIYIPRGPIMDYANFELLDFVTKTLKTFGKSKRALFIKIDPSLVIKQT- -LEGKESKEND 130 

Query: 125 EGLIAIDHLQPAGADOTGRTTDIAHTIQPRFQANLYANQFGljDKMSKKTRQAIRTSKNKG 184 

L I L++ G +W+GRT +L TIQPR QAN+YA F D + KK +Q+IRT+ NKG 
Sbjct: 131 VTLSIIAFLKKLGVEWSGRTKELEDTIQPRIQANIYAKDFDFDSLPKKAKQSIRTATNKG 190 

Query: 185 VDIQFGSHELLEDFAELMKKTEDRKGINLRG1DYYQKLLDTYPNNSY1TMASLDVAKRLE 244 

V++ G ELL+DF+ LMKKTE+RKGI LRG YYQKLL Y SYITMASLD+ ++ + 
Sbjct: 191 TOVTIGGSELLDDFSALMKKTENRKGIILRGKSYYQKLLGIYAGQSYITMASIjDLPEQKK 250 

Query: 245 KI EKECQIAQSER I KSLELNREKKVKQHQGT IDRL1IKE I DFLKEAQKAYDRD 1 1 PLAATL 304 

+ ++ A +E+ + + ++ KV ++Q TI RL K++ L E Q A + I PLAATL 
Sbjct: 251 LLIQaLDKALAEQARLTDKSKPSKVAENQKTIARLQI<XILTILSE-QLATGQTRIPIAATL 309 

Query: 305 TLEFGNTSENIYAGMDDYFKSYSAPIYTWFETAQRAFERGNIWQNMGGIENDLSGGLYHF 364 

TL +G TSEN+YAGMDD +++Y AP+ TW+ETA+ AF+RG W N+GG+EN GGLYHF 
Sbjct: 310 TLIYGETSENLYAGMDDDYROTQAPLL1WYETAKEAFKRGCRWHNLGGVE1IQQDGGLYHF 369 

Query: 365 KSKFEPI IEEFIGEFNI PVNRLLYKASNYVYALRKKRNS 403 

K++ P IEEF GEFNIPV L+ + Y LRKK S 
Sbjct: 370 KARLNPTIEEFAGEFNIPVG-LVSSLAILTYKLRKKLRS 407 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1758 

A DNA sequence (GBSxl865) was identified in S.agalactiae <SEQ ID 5465> which encodes the amino 
acid sequence <SEQ ID 5466>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2669 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Hot Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1759 

A DNA sequence (GBSxl866) was identified in S.agalactiae <SEQ ID 5467> which encodes the amino 
acid sequence <SEQ ID 5468>. This protein is predicted to be beta-lactam resistance factor. Analysis of this 
50 protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.07 Transmembrane 56 - 72 ( 55 - 74) 

55 Final Results 
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bacterial membrane Certainty=0 .1829 (Affirmative) < suco 

bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9625> which encodes amino acid sequence <SEQ ID 9626> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB89120 GB:AJ277484 beta-lactam resistance factor 
[Streptococcus pneumoniae] 
Identities = 166/410 (40%) , Positives = 250/410 (60%) , Gaps = 10/410 (2%) 

MYHVTVGI SEKEYDAFAI AS SQTNLIjHSSKWAOVKSNWQNERLGFYKDDQLVAVAS I LI K 65 
MY +GI EYD F N+L SS W +VKSNWQ+E+ G Y++++L+A ASILI + 

MYRYQIGIPTLEYDQFVKEHELANVLQSSAVIEEVKSNWQHEKFGVYREEKLLATASILIR SO 

SLPLGFTMLYIPRGP1MDYSNKELVNFVLKTLKNFGRKKRAVFAKFDPALLLRQYHLKEE 125 
+LPLG+ M YIPRGPI+DY +KEL+NF ++++K++ R KRAVF FDP++ L Q + +E 



E E+ 1D+L+ G +W G T+ -I 



Query: 




Sbnct: 


1 




66 


Sbjct: 


61 




126 


Sbjct: 


121 






Sbjct: 


179 




246 


Sbjct: 


239 




303 


Sbjct: 


296 




362 


Sb j Ct : 


356 



I- +R +++ Q+ K+ +LE 



+YAGMDD F+++ 



SLN GL FK FNP EEY+GEF + +P LY L LA RK R H 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5469> which encodes the amino acid 
sequence <SEQ ID 5470>. Analysis of this protein sequence reveals the following: 

Possible site: 32 



Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certaxnty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB89120 GB:AJ277484 beta-lactam resistance factor 
[Streptococcus pneumoniae] 
Identities = 166/402 (41%) , Positives = 255/402 (63%) , Gaps = 5/402 (l%) 

Query: 9 KIGISEEEHDSFVKEHQQISVLQGSDWAKIKNQWQNERIGIYKEEKQVASLSLLIKLLPL 68 

+IGI E+D FVKEH+ +VLQ S W ++K+ WQ+E+ G+Y+EEK +A+ S+LI+ LPL 
Sbjct: 5 QIGIPTLEYDQFWEHELANVLQSSAWEEVKSNWQHEKFGVYREEKLLATASILIRTLPL 64 



Query: 69 GRSIIYIPRGPVMDYLDRDLVAFTMKILKDYGKTKKALFIKYDPAILLKQYALGQEEEEK 128 
G + YIPRGP++DY D++L+ F ++++K Y ++K+A+F+ +DP+I L Q + QE+ E 
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Sbjch: 


65 




129 


Sbj Ct: 


125 




189 


Sbj ct: 


185 




24 9 






Query: 


306 


Sbjct: 


303 


Query: 


366 


Sbjct: 


363 



P LA I +LQ+ GV W+G T E+ D+IQPR QA IY + 3 + K T++ 1+ A+ 4 



L FS+++ TEKRK I LRNEAY++KL+ 



+ R K+E QK 



PK F PTIEE++GEF +P PLY + RK L+ KH 

HFKEKFNPTIEEYLGEFTMPTHPLYPLLRLALDFRKTLRKKH 404 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 226/407 (55%) , PositiveG = 318/407 (77%) , Gaps = 3/407 (0%) 

LMYHVTVGI S E KEYDAFAI ASSQTNLLHSSKWAQVKSNWQNERLGFYKDDQLVAVAS I L I 64 
L ++ +GISE+E+D+F Q ++L S KA++K+ WQNER+G YK+++ VA S+LI 

LTFYAKIGISEEEHDSFVKEHQQISVLQGSDKAKIKNQWQNERIGIYKEEKQVASLSLLI 63 

KSLPLGFTMLYIPRGPIMDYSNKELVNFVLKTLKNFGRKKRAVFAKFDPALLLRQYHLKE 124 
K LPLG +++YIPRGP+MDY +++LV F ' +KTLK++G+ K+A+F K+DPA+LL+QY L + 



-+IQPRFQANIYT+ N+E FPKHT+RL 



IKDAK RGV+ YR +• +L KF+ +V+LTE RK —LRNE YF +LMT YG+ AYL+LAK 



VN+P++L Q+++QL+ I +D++ T +HQK RL +L Q+AS+++YI EF+ F+ +YP+E 



V+AGILSI +GNV+EMLYAGM+D F+KFYPQYLL VF+DA+++ 1+ AN+GGVEGSL+ 



DGLTKFK+NF P EE+IGEFNL ++P LY +AN Y IRK+ ++ H 
DGLTKFKANFAPTIEEFIGEFNLPVSP-LYHIANTMYKIRKQLKNKH 407 

SEQ ID 5468 (GBS377) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 65 (lane 4; MW 49kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 71 (lane 4; MW 74kDa). 

GBS377-GST was purified as shown in Figure 212, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Query: 


5 


Sbjct: 






65 


Sbjct: 


64 




125 


Sbjct: 


124 


Query: 


185 


Sbjct: 


182 




245 


Sbjct: 


242 




305 


Sbjct: 


302 




365 


Sbjct: 


362 
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Example 1760 

A DNA sequence (GBSxl867) was identified in S.agalactiae <SEQ ID 5471> which encodes the amino 
acid sequence <SEQ ID 5472>. Analysis of this protein sequence reveals the following: 



Possible site: 22 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2073 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9627> which encodes amino acid sequence <SEQ ID 9628> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76720 GB-.AE000446 orf , hypothetical protein [Escherichia coli K12] 
Identities = 127/269 (47%) , Positives = 189/269 (70%) , Gaps = 1/269 (0%) 

Query: 7 SIKLVAVD1DGTLLNSKREITPEVAKAVQEAK3KGVKIVIATGRPI IGVQDLLEELKLNE 66 

+ IKL+A+D+DGTLL I+P V A+ A+++GV +V+ TGRP GV + L+EL + + 

Sbjct: 2 AIKLIAID^GTLLLPDHTISPAWmiAAARARGWvVLTTGRPYAGVHNYLKELHMEQ 61 

Query: 67 EGDYVITFNGGLVQDTATGDDIIKETLTyEDYLDFELLARKLGVHMHAITKEGIYTANRD 126 

GDY IT+NG LVQ AG + + L+Y+DY E L+R++G H HA+ + +YTANRD 
Sbjct: 62 PGDYCITYNGAIjVQKAADGSTVAQTALSYDDYRFLEKLSREVGSHFHALDRTTLYTANRD 121 

Query: 127 IGKYTIHETOLVNMPLFYRTPEEMG-DKBIIKLMMIDQPD1LDAAIAKIPKKVLDNYTIV 185 

I YT+HE + +PL + E+M + + +K+MMID+P ILD AIA+IP++V + YT++ 
Sbjct: 122 ISYYTVHESFVATIPLVFCFAEKMDPNTQFLKVIWIDEPAILDQAIARIPQEVKEKYTvl; 181 

Query: 186 KSTPFYLEILPKIWNKGTALLHIjAEKMGLTvDCTMAIGDEENPRAMLEWGNPVVMQNGN 245 

KS P++LEIL K VNKGT + LA+ +G+ ++ MAIGD+END AM+E G V M N 
Sbjct: 182 KSAPYFLEILDKRVNKGTGVKSLADVLGIKPEEIMAIGDQENDIAMIEYAGVGVAMDNAI 241 

Query: 246 PELKKIAKYITKSNEESGVAYALREWVIN 274 

p +K++A ++TKSN E GVA+A+ ++V+N 
Sbjct: 242 PSVKEVANFVTKSNLEDGVAFAIEKYVUI 270 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3407> which encodes the amino acid 
sequence <SEQ ID 3408>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3474 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 197/268 (73%) , Positives = 235/268 (87%) 

Query: 7 SIKLVAVDIDGTLLNSKREITPEVAKAVQEAKSKGVKIVIATGRPIIGVQDLLEELKIiNE 66 

SIKLVAVDIDGTLL R IT +V +AVQEAKJ-+GV +VIATGRPI GV LLE+L+LN 
Sbjct: 2 SIKLVAVDIDGTLLTDDRRITDDVFQAVQEAKAQGVHWIATGRPIAGVISLLEQLEUffl 61 

Query: 67 EGDYVITFNGGLVQDTATGDDIIKETLTYEDYLDFELLARKLGVHMHAITKEGIYTANRD 126 

+G++VITFNGGLVQD TG++I+KE +TY+DYL+ E L+RKLGVHMHAITKEGIYTANR+ 
Sbjct: 62 KGNHVITBNGGLVQDAETGEEIVKEIjyiTYDDYI^TEFLSRKLGVHimAITKEGIYTANRN 121 



Query: 127 IGKYTIHEVTLVImPLFYRTPEEMGDKEIIKL^MIDQPDILDAAIAKIPKKVLDNYTIVK 186 
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IGKYT+HE TLVNMP+FYRTPEEM +KEIIK+MMID+PD+LDAM +IP+ D YTIVK 
. Sbjot: 122 IGKfTVHESTLVHMPIFYRTPEEMTNKEIIKMMMIDEPDLLDAAIKQIPQHFFDKYTIVK 181 

Query: 187 STPFYLEILPKtmJKGTALLHiyffiKMGLTVDQTMAIGDEENDPJfflLEVVGNPVVMQNGNP 246 

STPFYLE +PK V+KG A+ HLA+K+GL + OTMAIGD ENDRAMLEW NPWM+NG P 
Sbjct: 182 STPFYLEFMPKTOSKGNAIKHIAKKLGLDMSQT^IGDAENDRMLEVVAIIPVVMENGVP 241 

Query: 247 ELKKIAKYITKSNEESGVAYALREWVIN 274 

ELKKIAKYI TKSN +SGVA+A+R+WV+N 
Sbjct: 242 ELKKIAKYITKSHNDSGVAHAIRKWV1JJ 269 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1761 

A DNA sequence (GBSxl868) was identified in S.agalactiae <SEQ ID 5473> which encodes the amino 
acid sequence <SEQ ID 5474>. Analysis of this protein sequence reveals the following: 



> N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 2360 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BRBC7537 GB:AP001520 unknown conserved protein [Bacillus halodurans] 
Identities = 211/423 (49%), Positives = 285/423 (66%), Gaps = 5/423 (1%) 

EKVFRDPWTYIHVNNQVIYDLINTKEFQRLRRIKQTSTTSFTFHGAEHSRFSHCDGVYE 62 
EKVF+DPVH YIHV +++I+ LI TKEFQRLRR++Q TT TFHGAEH+RF+H LGVYE 
EKVFKDPVHRYIHVRDELIWALIGTKEFQRLRRVRQLGTTFLTFHGAEHTRFNHSLGVYE 71 



Sbjct 
Sbj Ct : 

Sbjct 

Sbjc 

Query: 

Sbjct: 

Sbjct: 
Query: 
Sbj' 



63 LARKVTEI FDEHYSDLWNKNESLLTMAAALLHD IGHGAYSHTFERLFNTDHEAYTQE 1 1 T 122 

+ R++ E+F WN+ E LLT+ AALLHDIGHG +SH+FE++F+TDHE +T+ +1 

72 ITRRIIEVFQGR- - PYWNEEERLLTLCAALLHDIGHGPFSHSFEKVFDTDHEEWTRRMIV 129 

123 NPTTEINAILRKVAPDFPDKVASVINHSYPNKQWQLISSQIDCDRNDYLLRDSYYTAAS 182 

T EI+ +L K+ DFP KVA VI +YPNK V +ISSQID DRMDYL RD+YYT S 
130 GDT-EIHNVLLKMGDDFPQKVADVIEKTYPNKLVTSIISSQIDADRMDYLQRDAYYTGVS 188 

183 YGQFDLTRILRVIRPTDSGIAFARNGMHAVEDYIVSRFQMYMQVYFHPASRAMELLLQNL 242 

YG FD+ RILRV+RP + + ++GMHAVEDYI+SR+QMY QVYFHP +R+ E++L + 
189 YGHFDMERILRVMRPMEDQWIKQSGMHAVEDYIMSRYQMYWQVYFHPVTRSAEVILSKV 248 

243 LKRARFLFDTHRDFFEQTSPNLIPFFTDQYDLQDYIALDDGVMNTYFQSWMQADDNILAD 302 

KR + L++ F+Q + F L DYL LD+ + YFQ W + +D IL+D 

249 FKRVTOOLYEQGYK-FKQEPraFYSLFEGlsMSLDDYLRLDESITMyYFQIWQEEEDRILSD 307 

303 LANRFINRKVFKSITFEESDKEN-LVKMKELVSQVGFDPDYYTGVHANFDLPYDVYRPEH 361 

L RFINR++FK IF + + N ++++L +Q DP+YY V ++ DLPYD YRP 
308 L(^FimQLFKYIEFNPNLQMND5?PRLQQLFAQAEIDPEYYLVVDSSSDLPYDFYRPGE 367 



422 SYI 424 

S + 
428 SLL 430 
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-1985- 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5475> which encodes the amino acid 
sequence <SEQ ID 5476>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

5 

Final Results 

bacterial cytoplasm Certainty=0. 2220 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is show below. 

Identities = 321/428 (75%) , Positives = 379/428 (88%) 

Query: 1 MNEKVFRDPVHTYI HVNNQVI YDL INTKEFQRLRRI KQTSTTS FTFHGAEHSRFSHCLGV SO 
15 MNEKVFRDPVH YIH++N +IYDLINTKEFQRLRRIKQ TT+FTFHGAEHSRFSHCLGV 

Sbjct: 1 MNEKVFRDPVHNYIHIDNPLIYDLINTKEFQRLRRIKQVPTTAFTFHGAEHSRFSHCLGV SO 

Query: 61 YELARKVTEIFDEHYSDLWNKNESLLTMAAALLHDIGHGAYSHTFERLFNTDHEAYTQEI 120 
YE+AR+VT IF+E Y+D+WNK+ESL+TM AALLHDIGHGAYSHTFE LF+TDHFA+TQEI 
20 Sbjct: 61 YEIARRVTAIFEEKYAD1WNKDESLVTMTAALLHDIGHGAYSHTFEVLFHTDHFAFTQEI 120 

Query: 121 ITNPTTEINAILRKVAPDFPDKVASVINHSYPNKQWQLISSQIDCDRMDYLLRDSYYTA 180 

ITNP TEINAIL + APDFPDICVASVINH+YPNKQWQLISSQIDCDRMDYLLRDSY++A 
Sbjct: 121 ITNPETEINAILVRHAPDFPDKVASVINHTYPNKQWQLISSQIDCDRMDYLLRDSYFSA ISO 

25 

Query: 181 ASYGQFDLTRILRVIRPTDSGIAFARNGMHAVEDYIVSRFQMYMQVYFHPASRAMELLLQ 240 

A+YGQFDL RILRVIRP + GI F +GMHAVEDYIVSRFQMYMQVYFHPASRA+EL+LQ 
Sbjct: 181 ANYGQFDLMR1LRVIRPVEDGIVFEHSGMHAVEDYIVSRFQMYMQVYFHPASRAVBLILQ 240 

30 Query: 241 NLLI<R^FLFDTHRDFFEQTSPNLIPFFTDQYDLQDYLALDDGVMNTYFQSWMQADDN1L 300 

NL£jKRA+ L+ + +F++T+P LIPFF + +L DY+ALDDGVMNTYFQ WM ++D+1L 
Sbjct: 241 NLLKPAQHLYPEQQRYFQKTAPGLIPFFEKKANLADYIALDTCWMTYFQVWMASEDHIL 300 

Query: 301 ADLANRF1NRKVFKSITFEESDKENLVKMKELVSQVGFDPDYYTG\T1ANFDLPYDVYRPE 360 
35 +DLA4RFINRK+ KS+TF++ + L ++++LV VGFDPDYYTG+H NFDLPYD+YRPE 

Sbjct: 301 SDLASRFINRKILKSVTFDQDSQGELERIiRQLVESVGFDPDYYTGIHINFDLPYDIYRPE 3S0 

Query: 361 HSNPRTEIQIIQKNGQLftELSSLSPIVKALTGSNYGDQRFYFPKEMLTLDSLFSSTKEEF 420 
NPRT+I+++QK+G LAELS LSPIVKALTG+ YGD+RFYFPKEML LD LF+ +KE F 
40 Sbjct: 361 LENPRTQIEMI1QKDGSLAELSQLSPIVKALTGTTYGDRRFYFPKEMLELDDLFAPSKETF 420 

Query: 421 QSYITNEH 428 

SYI+N H 
Sbjct: 421 MSYISNGH 428 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1762 

A DNA sequence (GBSxl869) was identified in S.agalactiae <SEQ ID 5477> which encodes the amino 
50 acid sequence <SEQ ID 5478>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 4789 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=o. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5479> which encodes the amino acid 
sequence <SEQ ID 5480>. Analysis of this protein sequence reveals the following: 
Possible site: 57 

»> Seems to have no N-terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0 . 3650 (Affirmative) ■ 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 64/127 (50%) , Positives = 89/127 (69%) 

Query: 5 MKLEINNNIQIDNETEMIHEIHDCQFIEKGSYVYLNYINAEGERWIKANHEELLMTRFS 64 

MKL++ N+I+ +ETE+I EIHDC++ EKG Y YL Y N + E+WIK N EL M+RFS 
Sbjct: 1 MKLQLTNHIRFGDETEIIQEIHDCEWREKGGYQYLIYQNTDKEKVVIKYNETELTMSRFS 60 

Query: 65 NPKSVMRFHRETPALVNIPTPLGVQHLITETSHYQFDLSQQRLHINYVLKQTETGDCFAN 124 

NP+S+M+F L+ +PTP+GVQ +T+TSHY D S Q+L ++Y h Q +T FA+ 

Sbjct: 61 NPQSIMKFFAGKKVLIALPTPMGVQQFLTDTSHYHLDCSCQKLDLHYHLLQAQTEMLFAS 120 

Query: 125 YELRIQW 131 

Y L + W 
Sbjct: 121 YHLELSW 127 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



vaccines or 



Example 1763 

A DNA sequence (GBSxl870) was identified in S.agalactiae <SEQ ID 5481> which encodes the amino 
30 acid sequence <SEQ ID 5482>. This protein is predicted to be cation-transporting ATPase PacL (ctpF). 
Analysis of this protein sequence reveals the following: 



Possible site: 14 
»> Seems to have no IT- 
INTEGRAL Likelihood 

Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



256 - 272 ( 246 - 
64 - 80 ( 58 - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



( 828 - 855! 



75 

- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 793 



• 105 ( 81 - 

■ 880 ( 860 - 
- 303 ( 284 - 

■ 770 ( 753 - 

• 711 ( 694 - 

• 809 ( 792 - 



- Certainty=0. 6307 (Affirmative) ■ 
• Certainty=0. 0000 (Not Clear) < : 
■ Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 

3 calcium-transporting ATPase 

= 545/888 (60%) , Gaps = 49/888 (5%) 



>GP:CAB13439 GB:Z99112 similar t 
[Bacillus subtilis] 
Identities = 380/888 (42%), Positives = 



Query: 10 FYTQGQEEVLTSLESS-REGLSTTEAKNRLEMYGRNELEEGKKRSLIAKFFDQFKDLMII 68 

F4 GQ ++L + +S ++GL+ E K RL+ +G NEL+EGKK S + FF QFKD M++ 
Sbjct: 3 FHEMGQTDLLEATlOTSMKCjGLTEKEvKKRLDKHGPNELQEGKKTSALLLFFAQFKDFMVL 62 



Query: 69 ILLVAAALSVITEGMHG-LTDALIILAWILNAAFGVYQEGQAEAAIEALKDMSSPIARV 127 
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Sbjct: 


63 


Query: 


128 


Sbjct: 


119 


Query: 


183 


Sbjct: 


179 


Query: 


248 


Sb j Ct : 


239 


Query: 


308 


Sbjct: 


298 


Query: 


354 


Sbjct: 


358 


Query: 


406 


Sbjct: 


418 


Query: 


466 


Sbjct: 


478 


Query: 


524 


Sbjct: 


536 


Query: 


584 


Sbjct: 


593 


Query: 


644 


Sbjct: 


653 


Query: 


704 


Sbjct: 


713 


Query: 


763 


Sbjct: 


773 


Query: 


820 


Sb j ct : 


824 



G G DA+ I+A+V +N G +QE +AE +++ALK++S+P 



- SKELVPGD+V +GD + AD+R++EA SL+IEE+ALTGES+PV K 



D +GD NMA+ + VTSG GW TGM T +GKIADML +A 



V+LAVAAIPEGLPAIVT+ LS+G 



- +1 KGAPD L++R ++I 



v LA QALR + +AY+, 



P RPE +A++ +EAGI+ +MITGDH +TA+AIAK L ++ 



EE V 4- V+ ARVS PEHK+ + 1 VKA+Q +G +VAMTGDGVNDAP++K ADIG+ MGI 



)TI LVLGVYGWALMY PEHAGYRM I HADALTMAFATLGL I QLVHAFNVKS VYQS I F 819 

I V + + ++Y PE+ Y A T+AFATL L QL+H F+ +S S+F 

• - IGVATILAFI IVYHRNPENLAY AQTIAFATLVLAQLIHVFDCRS - ETSVF 823 

aFKNRTFNWSIPVAFILLMVTIWPGFNKLFHVTHLSSTQWLTW 867 

F+N ++ + +L++V I P +FH ++ W+ V+ 

JPFQNLYLIGAVLSSILLMLWIYYPPLQPIFHTVAITPGDWMLVI 871 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4171> which encodes the amino a 
sequence <SEQ ID 4172>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.47 Transmembrane 863 - 879 ( 856 - 



INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood 

INTEGRAL Likelihood = -5 

, INTEGRAL Likelihood = -3 

INTEGRAL Likelihood = -2 



Transmembrane 64 - 

Transmembrane 256 - 272 ( 249 - 275 

Transmembrane 89 - 105 ( 81 - 107 

Transmembrane 832 - 848 ( 827 - 850 

Transmembrane 287 - 303 ( 284 - 307 

Transmembrane 762 - 778 ( 761 - 779 
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-1988- 

INTEGRAL Likelihood = -0.37 Transmembrane 685 - 701 ( 685 - 701) 

Final Results 

bacterial membrane Certainty=0. 5989 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 735/892 (82%), Positives = 813/892 (90%), Gaps = 1/892 (0%) 

Query: 3 KEQKKSLFyTQGQEEVLTSLESSREGLSTTEAKNRLEMYGRNELEEGKKRSLIAKFFDQF 62 

KEQ+ FYTQ +E VL LE+SREGL++ +AK RL YGRNEL+EG+KRSL KF DQF 
Sbjct: 3 KEQRHFAFyTQSEETVLAQLETSREGLTSAQAKERLAEYGRNELDEGEKRSLFMKFLDQF 62 

Query: 63 KDLMIIILLVAA^SVITEGMHGLTDALIILAWIIjNAAFGVYQEGQAEAAIEALKDMSS 122 

KDLMI I IL+VAA LSV+TEGM GLTDA+IILAWILNAAFGVYQEGQAEAAIEALK MSS 
Sbjct: 63 KDLMI I ILIVAALLSVLTEGMEGLTDAI I IIiAWILNAAFGVYQEGOAEAAIEALKSMSS 122 

Query: 123 PIARVRRDGHTIEVDSKELVPGDLVMLFAGDWPADLRLLEAASLKIEEAALTGESVPVE 182 

P+AR+RRDGH E+DSKELVPGD+V+LEAGDWPADLRLLEA SLKIEEAALTGESVPVE 
Sbjct: 123 PIARIRRDGHVTEIDSKELVPGDIVIiLEAGDVVPADLRLLE7ANSLKIEE7AALTGESVPVE 182 

Query: 183 KDISQWAEDAGIGDRVNMAYQNSNVTYGRGYGVVTNTGMYTEVGKIADMIANADESETP 242 

KD+S V+EDAGIGDRVNM YQNSNVTYGRG GV+TNTGMYTEVG IA MLANADE++TP 
Sbjct: 183 KDLSTAVSEDAGIGDRVNMGYQNSNVTYGRGIGVITNTGMYTEVGHIAGMLANADETDTP 242 

Query: 243 LKQSLVQLSKLLTYLIVIIAVITFLVGIFVRKEGWIEGLMTSVALAVAAIPEGLPAIVTI 302 

LKQ+L LSK+LTY I++IA +TF VG+F+R + +EGLMTSVALAVAAI PEGLPAIVT+ 
Sbjct: 243 LKQNLDNLSKILTYAILVIAAVTFAVGVFLRGQHPLEGLMTSVALaVAAIPEGLPAIVn/ 3 02 

Query: 303 VLSMGTKTLAKRNSITOKLPAWTl^STEIIASDKTGITjTMNQMTVEKVYTNGVLQSSSE 362 

VLS+GT+ LAKRN+ 1 +RKLPAVETLGSTEI IASDKTGTLTMNQMTVEKVYTNG LQSSS 
Sbjct: 303 VLSLGTQVLAKRNAIIRKLPAVETLGSTEIIASDKTGTLTTOQMTVEKVYTNGTLQSSSA 3 52 

Query: 363 EISVDNNTIjRIMNFSNDTKIDPSGKLIGDPTETALVQFGLDKNFDVREVLKNEPRVAELP 422 

+1+ DN TLR+MNF+NDTK+DPSGKLIGDPTETALV+FGLD NFDVRE + EPRVAELP 
Sbjct: 363 DIAFDNTTLRVMNFANDTKVDPSGIOLIGDPTETALVEFGLDHNFDVREAiyiVAEPRVAELP 422 

Query: 423 FDSDRKLMSTIHKESDGRYFIAVKGAPDQLLKRVTKIEDNGLVRDITAEDKEAILNTNKE 482 

FDSDRKLMSTIHK++DG+YFIAVKGAPDQLLKRVT+IE+NG +R IT DK+ IL+TNK 
Sbjct: 423 FDSDRKLMSTIHKQADGKYFIAVKGAPDQLLKRVTQIEENGQIRPITDADKKTILDTNKS 482 

Query: 483 LAKQALRVLMMAYKYETQIPSLETDIVESDLVFSGLVGMIDPERPEAAEAVRVAKEAGIR 542 

LAKQALRVLMMAYKY +P+LET+IVE++LVFSGLVGMIDPERPEAA+AV+VAKEAGIR 
Sbjct: 483 IAKQALRVLMmYKYSDALPTLETEIVEAriLVFSGLVGMIDPERPEAAQAVKVAKEAGIR 542 

Query: 543 PIMITGDHQDTAEAIAKRLGIIDANDTEDHVFTGAELNELSDEEFQKVFKQYSVYARVSP S02 

PIMITGDHQDTA+AIAKRLGI I + D DHVFTGAELNELSDEEFQKVFKQYSVYARVSP 
Sbjct: 543 PIMITGDHQDTAKAIAKRLGIIE-EDGVDHVFTGAELNELSDEEFQKVFKQYSVYARVSP 601 

Query: 603 EHKOTIVKAWQNDGKWAMTGDGVNDAPSLKTADIGIGMGITGTEVSKGASDMVIADDNF 662 

EHKVRIVKAWQN+GKWAMTGDGVNDAPSLKTADIGIGMGITGTEVSKGASDMVLADDNF 
Sbjct: 602 EHKVRIVKAWQNEGKWAMTGDGVNDAPSLKTADIGIGKGITGTEVSKGASDMVLADDNF 661 

Query: 663 ATIIVAVEEGRKVFSNIQKSIQYLLSANMAEVFTIFFATLLGWDVLAPVHLLWINLVTDT 722 

ATI IVAVEEGRKVFSNIQK+ IQYLLSANMAEVFTIF ATL GWDVL PVHLLWINLVTDT 
Sbjct: 662 ATIIVAVEEGRWFSNIQKTIQYIiSANMAEVFTIFLATLFGWDVLQPVHLLWINLVTDT 721 

Query: 723 LPAIALGVEPAEPGVMTHKPRGRQSNFFDGGVMGAIIYQGILQTILVLGVYGWALMYPEH 782 

LPAIALGVEPAEPGVM HKPRGR+S+FFDGGV AI+YQG QTILVLGVYG+ALM+PEH 
Sbjct: 722 LPAIALGVEPAEEGVT^KHKPRGRKSSFFDGGVKEAILYCGAFQTILVLGVYGFALMFPEH 781 

Query: 783 AGYRMIHADALT^FATLGLIQLVHAFNVTCSWQSIFTV'GAFKNRTFNWSIPVAFILLMV 842 

Y +HADALTMA+ TLGLIQLVHA+NVKSVYQSIFTVG FKN+ FN+SIPVAF+ LM 
Sbjct: 782 TSYHDVHADALTMAYVTLGLIQLVHAYNVKSWQSIFTVGLFKNKLFNYSIPVAFVALMA 841 



Query: 843 TIWPGFNKLFHVTHLSSTQWLTWIGSLLilWLTEIVKFIQRKLGQDEKAI 894 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



A related GBS gene <SEQ ID 8897> and protein <SEQ ID 8898> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: -S.88 
GvH: Signal Score (-7.51: -6.96 

Possible site: 14 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 9 value: -13.27 threshold: 0.0 
INTEGRAL Likelihood =-] 
INTEGRAL Likelihood = - 
Likelihood = • 
Likelihood = • 
Likelihood = ■ 
Likelihood = • 
Likelihood = • 
Likelihood = - 
Likelihood = • 
Likelihood = 
modified ALOM score: 3.15 

*** Reasoning Step: 3 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



864 - 880 ( £ 



695 - 711 ( t 



■ 855) 

■ 107) 

■ 884) 

■ 306) 
- 773) 
• 711) 

■ 809) 



• Certainty=0. 6307 (Affirmative) < suco: 

• Certainty=0. 0000 (Not Clear) < suco 

• Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01112(328 - 2901 of 3282) 

EGAD|l08247|BS1566(3 - 871 of 890) hypothetical protein {Bacillus subtilis} OMNI |NT01BS1841 
cation-transporting ATPase Pact GP | 2337795 | emb | CAA74269 . 1 | |Y13937 putative PacL protein 
{Bacillus subtilis} GP| 2633938 | emb | CAB13439 . 1 1 | Z99112 similar to calcium- transporting 
ATPase {Bacillus subtilis} PIR)H69877|H69877 calcium-transporting ATPase homolog yloB - 
Bacillus subtilis 
%Match =29.0 

%Identity =43.9 %Similarity =64.5 

Matches = 376 Mismatches = 291 Conservative Sub.s = 176 



249 



279 



309 



339 



369 



396 



426 



456 



GVVLNSETCFHKlTOSLFVCGETKGGKVLLKEQKXSLFYTQGQEEVLTSLESS-REGLSTTEAKmLEMYGRlTOLEEGKKR 
1= II ::| = =1 <«||« I I 11= :| llhllll 



SLIAKFFDQFKDLMIIILLVAAALSVITEGfflGLTDALIIIAWIMAAFGWQEGQAEAAIEALKDMSSPIARVRRDGH 

I = II 1111:1:^11 I = = 1= = 11= I = I = I =1 =1 =11 =11 " = I I I - I = I 1 = 1 

SALLLFFAQFKDFMVLVLL- - -AATLISGFLGEYVDAVAI IAIVFVNGILGFFQERRAEQSLQALKELSTPHVMALREGS 



TIETOSKELVPGDLVMLEAGDWPADLRLLEAASLKIEEAALTGESVPVEICDISQWAEDAGIGDRVNMAYQNSNVTYGR 

= = MINIM: = =11 = ||:|::|) I I = I I I = I I I I I I = I I I == I =11 111= = II 1 
WTKIPSKELVPGDIVKFTSGDRIGADWIVEARSLEIEESALTGESIP\n^CHADKLHCPDVSLGDITNMAFMGTIVTRGS 



140 



150 



160 



170 



180 



190 



200 



966 996 1026 1056 1086 1116 1146 1176 

GYGVVTNTGMYTEVGKIADMLANADESETPLKQSLVQLSKLLTYLIVI IAVITFLVGIFVRKEGWIEGLMTSVALAVAAI 
I III III I =1111111 =1 |||:: I II 1=1 = === 1= 11= == == 1=111111 

GVGWVGTGMNTAMGKIADMLESAGTLSTPLQRRLEQLGKILIVVALLLTVLWAVGV- IQGHDLYSMFLAGVSLAVAAI 



220 



230 



250 



260 



270 
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1206 1236 1266 1296 1326 1356 1374 

PEGLPAIOTIVLSMGTKTLAKRNSIWKLPAVETLGSTEIIASDKTGTLTm^QMTVEKVYT NGVLQ 

lllllll)]: ||:| : : |: | | | | | I I I I I I I I II 111111 = 1 1 = 111 l = = I = 

PEGLPAI VTVALSLGVQRMI KQKS I VRKLPAVETLGCAS 1 1 aSDKTGTMTQNKMTVTHVWSGGKTWRVAGAGYEPKGSFT 
5 300 310 320 330 340 350 360 

1404 1440 1470 1500 1530 1560 1590 

SSSEEISVDMNT LRIMNFSNDTKIDPSGKLIGDPTETALVQFGLDKNFDVREVLKNEPRVAELPFDSDRKLM 

: :|||h = =111111 Mill 11= 111= 1=1111 11 = 1 

1 0 IJSIEKEISVNEHKPLQQMLLFGALCNNSNIE 

380 390 400 410 420 430 440 

1620 1650 1680 1710 1740 1770 1794 1824 

STIHKESDGRYFIAVKGAPDQLLKROTKIEDNGLVRDI^ 
15 : | : | : :| Mill l = = l = = l =1 =11 = II I I I I = =11= = 11 = 1 

WIVENQDRKRYIITKGAPDVLMQRSSRIYYDGSAALF3^RKAETEAVLRHLASQALRTIAVAYRPIKAGETPSME--Q 
460 470 430 490 500 510 520 

1854 1884 1914 1944 1974 2004 2034 2064 

20 VESDLVFSGLVGMIDPERPFAAFAVRVAKEAGIRPIMITGDHQDTAEAIAKRLGIIDANDTEDHVFTGAELNELSDEEFQ 

i ii ii 1 = 111 iii =i = = =iiii= =111111 =11 = 1111 i = i him n = 

AEKDLTMLGLSGIIDPPRPEVRQA1KECREAGIKTVMITGDHVETAKAIAKDL RLLPKSGKIMDGKMIjNELSQEEIjS 

530 540 550 560 570 580 590 

25 2094 2124 2154 2184 2214 2244 2274 2304 

KVFKQYSVYARVSPEHKVRIVKAWQNDGKWAMTGDGVNDAPSLKTADIGIGMGITGTEVSKGASDMVLADDNFATIIVA 

i = Miiiiiii = = im = i =i miiinniimi nm mii) = i = i ii =n mini i 

HVVEDVYVFARVSPEHKLKIVKAYQENGHIVAMTGDGVNDAPAIKQADIGVSMGITGTDVAKEASSLVLVDDNFATIKSA 
610 620 630 640 550 660 670 

30 

2334 2364 2394 2451 2481 2511 2541 

VEEGRKVFSNIQKSIQYLLSAlMAEVFTIFFATLLGVTOV-LAPVHLIiWI^VTDTLPAIALGVEPAEPGVMTHKPRGRQS 

: = ||| :: Ihl |:||| = = l= 1 = = = = ll II = I |: = = ll = lllll 111 = 111 = = I II III = 
IKEGRNIYENIRKFIRYLLASWGEILVMLFAMLIALPLPLVPIQILWWLVTDGLPAMALGMDQPEGDVMKRKPRHPKE 
35 690 700 710 720 730 740 750 

2571 2601 2631 2661 2691 2721 2751 2781 

NFFDGGVMGAIIYQGILQTILVLGVYGWALMYPEHAGYRMIHADALTMAFATLCLIQLVHAFIWKSWQSIFWGAFK^ 

I = == =1 1 I = = ==l 1 1 1 1=11111 111=1 1= =1 M= 1=1 

40 GVFARKLGWKWSRGFLIG--VATILAFIIVY--HRN-PENLAYAQTIAFATLVLAQLIHVFDCRS-ETSVFSRNPFQNL 
770 780 790 800 810 820 830 

2811 2841 2871 2901 2931 2961 2991 3021 

TFNWSIPVAFILLIWIIWPGFNKLFHVTHLSSTQWLTWIGSLLMVVLTEIWFIQRKLGQDEKAI*FS**KNSLRIS^ 
45 : :: : :|::| | | : :|| |: |: 

YL1GAVLSSILLMLWIYYPPLQPIFHTVAITPGDWMLVIGMSAIPTFLLAGSLLTRKK 
850 860 870 880 890 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1764 

A DNA sequence (GBSxl871) was identified in S.agalactiae <SEQ ID 5483> which encodes the amino 
acid sequence <SEQ ID 5484>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results — ' — 

bacterial cytoplasm Certainty=0. 2905 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco' 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB48940 GB-.AJ248283 hypothetical protein. [Pyrococcus abyssi] 
Identities = 60/221 (27%) , Positives = 100/221 (45%) , Gaps = 37/221 (16%) 

Query: 33 KIDHLHIA GD I SNHFTKDTLP - FINNLKKH 1 KLSYNLGNHDMLDLTE - - TE 80 

5 KID L I GD+SN+ D + 1+ L + L GNHD+ L + 

Sbjct: 15 KIDVLKIPDIAIQLGDLSNYGEPDIIENLISELVTQLDPVPLLVIPGNHDIYGUTOIFAA 74 

Query: 81 IQRLDFQTYR-- FDKKMLLAFHGVreDYSFSHN- -RDIKDVEKLKKTFWFD 126 

QR + R ++ ++ GWYDYS + KD ++K F F 

10 Sbjct: 75 FQRFNK1VKRAGAIPLMEGPLILEEIGIVGVPGWYDYSLAPGYLNMTKDEYEIK-AFGFR 133 

Query: 127 RR LKRPNNDVTIQASILKRLDEILAKTOSS--NIIIAMHFVPHKQFTMT--HPRF 177 

R +K +D + L L++ ++++ S ++I+A+HF P K +P 

Sbjct: 134 RLEDADYIKSSLSDEELWWNLNLLEKFISEIRESVNDVILALHFAPFKDSLKYTGNPEI 193 

15 

Query: 178 SPFNAFLGSQAYHDLFQKYHIKDWFGHAHRSFGDVKIGET 218 

F+A++GSQ + + +++I +V GH HRS + IG+T 
Sbjct: 194 DYFSAYMGSQRFGEFALRHNIGLIVHGHTHRSI-EYYIGKT 233 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1765 

A DNA sequence (GBSxl872) was identified in S.agalactiae <SEQ ID 5485> which encodes the amino 
25 acid sequence <SEQ ID 5486>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.18 Transmembrane 173 - 189 ( 173 - 189) 

30 Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty-0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16056 GB:Z99124 f ructose-1, 6-bisphosphatase [Bacillus subtilis] 
Identities - 314/642 (48%) , Positives = 446/642 (68%) , Gaps = 7/642 (1%) 

Query: 2 SNFYKLLKEKFPRKEDIVTEMINLEAICQLPKGTEYFISDLHGEYDAVDYLLRTGAGSIR 61 
40 S + LL +K+ +E +VTE+INL+AI LPKGTE+F+SDLHGEY A ++LR G+G ++ 

Sbjct: 33 SKYLDLIAQKYDCEEKVVTEIINLKAILNLPKGTEHFVSDLHGEYQAFQHVLRNGSGRVK 92 

Query: 62 AKLLDCFDWQKIVAVDLDDFCILLYYPKEKLAFDI<MNLSASAYKTKLW-EMIPLQIQVLK 120 
K+ D F I ++D+ L+YYP++KL K + A + + E I I+++ 

45 Sbjct: 93 EKIRDIFSGV-IYDREIDELAALVYYPEDKLKLIKHDFDAKEAIiNEWYKETIHRMIKLVS 151 

Query: 121 YFSSECYTKSKVRKQLSGKFAYIIEELLAEIDRNPEKKSYFDTIIEKLFELDQVEDLIIVL 180 

Y SSKYT+SK+RK L +FAYI EELL + ++ K+ Y+ II+++ EL Q + LI L 
Sbjct: 152 YCSSKYTRSKLRKALPAQFAYITEELLYKTEQAGNKEQYYSEIIDQHELGQADKLITGL 211 

50 

Query: 181 SQTIQVLIIDHLITWGDIYDRGRYPDRILNRLMAFPOTjDIQWGNHDvTWMGAASGSYLCM 240 

+ ++Q L++DHLHWGDIYDRG PDRH- L+ + ++DIQWGNHDV W+GA SGS +C+ 
Sbjct: 212 AYSVQRLWDHLHV^/GDIYDRGPQPDRIMEELim"HSVDIQWGNHDVLWIGAYSGSKVCL 271 

55 Query: 241 VNVIRIAARYNNITLIEDRYGINLRRLVDYSRRYYEPLPSFVPILDGEEMTHPDELDLLN 300 

N+IRI ARY+N+ +IED YGINLR L++ + +YY+ P+F P D E DE+ + 
Sbjct: 272 ANIIRICARYDNLDIIEDVYGINLRPLLNL2ffiKYYDDNPAFRPKM3--ENRPEDEIKQIT 329 

Query: 301 MIQQATAILQFKLEAQLIDRRPEFQMHNRQLXJSrQVJnrKDLSISIKEVTOQLKDFNSRCID 360 
60 I QA A++QFKLE+ +1 RRP F M R L+ +++Y I++ +QL++ 1+ 

Sbjct: 330 KIHQAIAMIQFKLESPIIICRRPNFNI4E3RLLLEKIDYDKNEITLNGKTYQLENTCFATIN 389 



WO 02/34771 PCT/GB01/04789 



Query: 361 SKNPSRLTSEEEELLQQLMIAFQTSESLKKHIDFLFEKGSMYLTVBDWLLFHGCIPMHSN 420 

+ P +L EE E++ +L+ + Q SE L +H++F+ +KGS+YL YN NLL HGCTP+ N 
Sbjct: 3 90 PEQPDQLLEEEAEVIDKLLFSVQHSEKLGjRHMNFMMKKGSLYLKKWGNLLIHGCIPVDEW 449 

Query: 421 GDFKSFKIAGKTYGGRDLLDLFESQIRIiAYARPEKHDDIATDIlWYLWCGENSSLFGKNA 480 

G+ ++ I K Y GR+LLD+FE +R A+A PE+ DDLATD+ WYLW GE SSLFGK A 
Sbjct: 450 GNMETi^IEDKPYAGRELLDVFERFLRH^FAHPEETDDLATDMAWYLWTGEYSSLFGKRA 509 

Query: 481 MTTFERYYVSDKVTHQERKNPYFKLRDKDDICTALLQEFDL-PKFGHIVNGHTPVKEKNG 539 

MTTFERY++ +K TH+E+KNPY+ LR+ + C +L EF L P GHI +NGHTPVKE G 
Sbjct: 510 MTTFERYFIKEKETHECEKKNPYYYLREDEATCRNIIAEFGLNPDHGHIINGHTPVKEIEG 569 

Query: 540 EQPIKANGKMLVIDGGFAKGYQKNTGLAGYTL1YNSYGIQLISHLPFTSIEEVLSGTNYI 599 

E PIKANGKM+VIDGGF+K YQ TG+AGYTL+YNSYG+QL++H F S EVLS + 
Sbjct: 570 EDPIKANGKMIVIDGGFSKAYQSTTGIAGYTLLYNSYGMQLVAHKHFNSKAEVLSTGTDV 629 

Query: 600 IDTKRLVEEAKDRIDVKDTTIGQKLTKEIKDLDHL--YRHFQ 639 

+ KRLV++ +R VK+T +G++L +E+ L+ L YR+ + 
Sbjct: 63 0, LTVKRLVDKELERKKVKETNVGEELLQEVAILESLREYRYMK 671 

No corresponding DNA sequence was identified in S.pypgenes. 

SEQ ID 5486 (GBS197) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 168 (lane 17 & 18; MW 89kDa) and in Figure 169 (lane 2; MW 89kDa). It was 
also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in 
Figure 37 (lane 6; MW 99kDa). 

Purified Thio-GBS197-His is shown in Figure 244, lane 6. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 



Example 1766 

A DNA sequence (GBSxl873) was identified in S.agalactiae <SEQ ID 5487> which encodes the a 
acid sequence <SEQ ID 5488>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2433 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12719 GB:Z99108 alternate gene name: ygaP-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 176/367 (47%), Positives = 240/367 (64%), Gaps = 6/367 (1%) 

Query: 3 IKAEIQKLAKEIGISKIGFTTADNFDYLEKSLRASVEEGRNSGFEHKVIEDRIYPERLLE 52 

+K E+ + AK IG+ KIGFTTAD FD L+ L G SGFE IE R+ P+ LL 

Sbjct: 55 LKEELIEYAKSIGVDKIGFTTADTFDSLKDRLILQESLGYLSGFEEPDIEKRVTPKLLLP 114 

Query: 63 SAKTIISIGVAYPHKLPQQPQKT-SYKRGKITPNSWGLDYHYWGEKLDRLSKGIEELCR 121 

AK+I++I +AYP ++ P+ T + +RG SWG DYH V+ EKLD L ++ 

Sbjct: 115 KAKSIVAIAIAYPSP^KDAPRSTRTERRGIFCRASMGKDYED^REKLDLLEDFLKSKHE 174 



Query: 122 DFPLQQKAMVDTGALVDTAVAQRAGIGFIGKITGLVISKEYGSYMFLGELITNLEIEPDKP 181 
55 D ++ K+MVDTG L D AVA+RAGIGF KN +H- + EYGSY++L E+ITN+. EPD P 

Sbjct: 175 D--IRTKSMVDTGELSDRAVAERAGIGFSAKNCMITTPEYGSYVYLAEMITNIPFEPDVP 232 
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Query: 


182 


Sbjct: 


233 


Query: 


242 


Sbjct: 


293 




300 


Sbj ct : 


353 




360 


Sbjct: 


412 



CG C +CLDACPT L+ G +HA+RC+SF TQ KG + EFR KI +YGCD C 



L E+ DPE+A+P L P I.++SN +FKEKFG ++GSWRGK +QRN 



AI+ALA+ D +A+ +L E++ K+ P+ TA WA+G+I 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5489> which encodes the amino acid 
sequence <SEQ ID 5490>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm --- Certainty=0. 3337 (Affirmative) < euco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 353/374 (97%) , Positives = 367/374 (98%) 



LESAKTIISIGVAYPHKLPQQPQKTSYKRGKITPNSWGLDYHYWGEKLDRLSKGIEELC 120 
LESAKTI I S IGVAYPHKLPQQPQKT YKRGKITP+SWGLDYHYWGEKLDRLSKGIEELC 
LESAKTI IS IGVAYPHKLPQQPQKTPYKRGKITPSSWGLDYHYWGEKLDRLSKGIEELC 137 



PVDYDCGDCRRCLDACPTSCLIGDGSMNAKRCLSFQTQDKGMMDIEFRKKIKTVIYGCDI 



CQI CCPYNKGINN ATEIDPELAQPELIPFLSLSNG+FKEKFGMIAGSWRGKNILQRNA 



IIAIANAHDKTAWI<LIEIIDKNNNPIHTATAIWALGEIVKKPNDEIL FMS+LTLKDED 



SRKELELIRHKWQF 





Query: 


1 




Sbjct: 


18 


35 


Sbjct: 


61 


40 








Sbjct: 


138 






181 


45 


Sbjct: 


198 




Query: 


241 




Sbjct: 


258 


50 








Query: 


301 




Sbjct: 


318 


55 


Query: 


361 




Sbjct: 


378 



Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
60 vaccines or diagnostics. 
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Example 1767 

A DNA sequence (GBSxl874) was identified in S.agalactiae <SEQ ID 5491> which encodes the amino 
acid sequence <SEQ ID 5492>. This protein is predicted to be peptide chain release factor 2 , fragment 
(prfB). Analysis of this protein sequence reveals the following: 

5 Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4903 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC67303 GB:AF017113 putative peptide chain release factor RF-2 
15 [Bacillus subtilis] 

Identities = 194/335 (57%) , Positives = 251/336 (73%) , Gaps = 2/336 (0%) 

Query: 2 EKEIALLENQMTEPDFWNDNIAAQKTSQELMSLKGKYDTFHMMQELSDETELLLEMLDE- 60 
E IA L+ QM +P+FWND AQ E N LK 4++ + E +E ++ ++L E 
20 Sbjct: 30 FJUilAELDEQMADPEFWNDQQKAQTVINEANGLKDYVT^SYKKliNESHEELQMTHDLLKEE 89 

Query: 61 -DDSLKEELEENLMQLDKIMGAYEMTLLLSEPYDHNNAILEIHPGSGGTEAQDVIGDLLLR 119 

D L+ ELE+ L L K +E+ LLLSEPYD NNAILE+HPG+GGTE+QDWG +LLR 
Sbjct: 90 PDTDLQLELEKELKSLTKEFNEFELQLLLSEPYDKNNAILELHPGAGGTEEQDWGSMLLR 149 

25 

Query: 120 MyTRFGNANGFKvEVLDYQAGDEAGlKSVTLSFEGPNAYGLLKSEMGVHRLVRISPFDSA 179 

MYTR+G GFKVE LDY GDEAGIKSVTL +G NAYG LK+E GVHRLVRISPFDS+ 
Sbjct: 150 IvnrTRWGERRGFKVETLDYliPGDEAGIKSVTLLIKGHNAYGYLKAEKGVHRLVRISPFDSS 209 

30 Query. 180 KRRHTSFASVEVMPELDDTIEVBVRDDDIKMJTFRSGGAGGQNVNKv'STGVRLTHIPTGI 239 

RRHTSF S EVMPE +D I++++R +DIK+DT+R+ GAGGQ+VN + VR+TH+PT + 
Sbjct: 210 GRRHTSFVSCEvMPEFNDEIDIDIRTEDIKVDTYRASGAGCQHWTTDSAWITHLPTNV 269 

Query: 240 WSSTVDRTQYGNRDRAMKMLQAKLYQLEQEKKAQEVDALKGDKKEITWGSQIRSYVFTP 299 
35 W+ +R+Q NR+RAMKML+AKLYQ E++ E+D ++G++KEI WGSQIRSYVF P 

Sbjct: 270 WTCQTERSQIKNRERAMPCMLKAKLYQRRIEEQQAELDEIRGEQKEIGWGSQIRSYVFHP 329 

Query: 300 YTMVKDHRTNFELAQVDKVMDGEINGFIDAYLKWRI 335 
Y+MVKDHRTN E4 V VMDG+I+ FIDAYL+ ++ 
40 Sbjct: 330 YSMVKDHRTNTEMGNVQAVMDGDIDTFIDAYLRSKL 365 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5493> which encodes the amino acid 

sequence <SEQ ID 5494>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 334/337 (99%) , Positives = 336/337 (99%) 

55 Query: 1 MEEEIALLENQMTEPDFWNDNIAAQKTSQELNELKGKYDTFHNMQELSDETELLLEMLDE 60 

+EEEIALLEN MTEPDFWNDNIAAQKTSQELNELKGKYDTFHNMQELSDETELLLEMLDE 
Sbjct: 1 LEEEIALLENHMTEPDF/^NlAAQKTSQEIMLKGKYDTFHNMQELSDETELLLEMIjDE 60 

Query: 61 DDSLECEELEENLMQLDKIMGAYEMTLLLSEPYDHNNAILEIHPGSGGTEAQDWGDLLLRM 120 
60 • DDSLKEELEENLMQLDKIMGAYEMTLLLSEPYDHNNAILEIHPGSGGTEAQDWGDLLLRM 

Sbjct: 61 DDSLKEELEENLMQLDKIMGAYEMTLIiLSEPYDHNNAILEIHPGSGGTEAQDWGDLLLRM 120 
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Query: 121 YTRFGNANGFKVEVLDYQAGDHAGIKSOTLSFEGPNAYGLLKSEMGVHRLWISPFDSAK 180 

YTRFGNANGFK+EVLDYQAGDEAGIKSVTLSFEGPNAYGLLKSEMGVHRLVR1SPFDSAK 
Sbjct: 121 YTRFGNANGFKIEVLDYQAGDEAGIKSVTLSFEGPNAYGLLKSEMGVHRLVRISPFDSAK 180 

Query: 181 RRHTSFASVEVMPELDDTIEVEVRDDDIKMDTFRSGGAGGQNVNKVSTGVRLTHIPTGIV 240 

RRHTSFASVEVMPELDDTIEVEVRDDDIKMDTFRSGGAGGQNVNKVSTGVRLTHIPTGIV 
Sbjct: 181 RRHTSFASVEVMPELDDTIEVEVRDDDIKMDTFRSGGAGGQNVNKVSTGVRLTHIPTGIV 240 

Query: 241 VSSTVDRTQYGNRDRAMKMLQAKLYQLEQEKKAQEVDALKGDKKEITWGSQIRSYVFTPY 300 

VSSTVDRTQYGNRDRAMKMLQAKLYQLEQEKKAQEVDALKGDKKEITWGSQIRSYVFTPY 
Sbjct: 241 VSSTVDRTQYGNRDRAMKMLQAiCLYQLEQEKKAQEVDALKGDKKEITWGSQIRSYVFTPY 300 

Query: 301 TMVKDHRTNFELAQVDKVMDGEINGFIDAYLKWRIED 337 

TMVKDHRTNFELAQVDKVMDGEINGFIDAYLKWRIED 
Sbjct: 301 TMVKDHRTNFELAQVDKVMDGEINGFIDAYLKWRIED 337 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1768 

A DNA sequence (GBSxl875) was identified in S.agalactiae <SEQ ID 5495> which encodes the amino 
acid sequence <SEQ ID 5496>. This protein is predicted to be cell-division ATP-binding protein (ftsE). 
Analysis of this protein sequence reveals the following: 

N- terminal signal sequence ■ 

Final Results 

bacterial cytoplasm — Certainty=0. 3928 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC67262 GB:AF017113 cell division ATP-binding protein [Bacillus subtilis] 
Identities - 138/228 (60%), Positives = 179/228 (77%) 

Query: 3 LIEMSGWKKYRRSTTALRNIiNLSIQQGEFVYLVGPSGAGKSSLIRLLYREEKLSSGRLK 62 

+IEM V K Y AL ++++I GEFVY+VGPSGAGKS+ I+++YREEK + G++ 

Sbjct: 1 MIEMKEVYKAYPNGVKALNGISVTIHPGEFVYVVGPSGAGKSTFIKMIYREEKPTKGQIIj 60 

Query: 63 VGEFNraKLKRRQIPILRRSIGWFQDYKLLPTKTVYEWVAFAMQVIGAKRRHIKKRVPE 122 

+ +L +K ++IP +RR IGWFQD+KLLP TV+ENVAFA++VIG + IKKRV E 
Sbjct: 61 INHKDLATIKEKEIPFVRRKIGWFQDFKLLPKLTVFENVAFALEVIGEQPSVIKKRVLE 120 

Query: 123 VLELVGLKHMRSFPTQLSGGEQQRVAIARAIVNNPKLLIADEPTGNLDPEIAWEIMHLL 182 

VL+LV LKHK R FP QLSGGEQQRV+IAR+IVNNP ++IADEPTGNLDP+ +WE+M L 
Sbjct: 121 VLDLVQLKHKARQFPDQLSGGEQQRVSIARSIVNNPDWIADEPTGNLDPDTSWEVMKTL 180 

Query: 183 ERINLQGTTVLMATHNSQIVNTLRHRVIEIEAGSVIRDEEKGEYGYHD 230 

E IN +GTTV+MATHN +IVNT++ RVI IE G ++RDE +GEYG +D 
Sbjct: 181 EEINNRGTTVVMATHNKEIVNTMKKRVIAIEDGI IVRDESRGEYGSYD 228 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5497> which encodes the amino acid 
sequence <SEQ ID 5498>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 3728 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) • 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/230 (83%) , Positives = 214/230 (93%) 
Query: 
Sbjct: 

Query: 61 LKVGEFMjNKLKRRQIPILRRSIGWFQDYKLL 120 

L VGEFNL KLK R +PILRR IGWFQDYKLLP KTV+ENVA+AM+VIG KRRHIKKRV 
Sbjct: 61 LYVGEFNLTKLKARDVP I LRRH IGWFQDYKLLPRKTVFENVAYAMEVIGEKRRHI KKRV 120 

Query: 121 PEVLELVGLKHKMRSFPTQLSGGEQQRVAIARAIVNNPKLLIADEPTGNLDPEIAWEIMH 180 

PEVL+LVGLKHKMRSFP+QLSGGEQQRVAIARAIVNNPKLLIADEPTGNLDPEI+WEIM 
Sbjct: 121 PEVLDLVGLKHKMRSFPSQLSGGEQQRVAIARAIVMNPKLLIADEPTGNLDPEISWEIMQ 180 

Query: 181 LLERINLQGTTVLMATHNSQIVNTLRHRVIEIEAGSVIRDEEKGEYGYHD 230 

LLERIN+QGTT+LMATHNS IVNT RHRV+ IE G + +RDEEKG+ YGY D 
Sbjct: 181 LLERINVQGTTILMATHNSHIVNTFRHRWAIEDGRIVRDEEKGDYGYDD 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1769 

A DNA sequence (GBSxl876) was identified in S.agalaetiae <SEQ ID 5499> which encodes the amino 
acid sequence <SEQ ID 5500>. This protein is predicted to be ftsE protein (ftsX). Analysis of this protein 
sequence reveals the following: 
Possible site: 45 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.77 Transmembrane 296 - 312 ( 291 - 322) 
INTEGRAL Likelihood = -9.24 Transmembrane 203 - 219 ( 198 - 228) 
INTEGRAL Likelihood = -6.16 Transmembrane 49 - 65 ( 40 - 68) 
INTEGRAL Likelihood = -3.40 Transmembrane 255 - 271 ( 252 - 273) 

Final Results 

bacterial membrane Certainty=0 . 5310 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9629> which encodes amino acid sequence <SEQ ID 9630> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC67264 GB:AF017113 cell division protein [Bacillus subtilis] 
Identities = 112/311 (36%), Positives = 182/311 (58%), Gaps = 31/311 (9%) 

Query: 27 RHFVffiSLKNLKRNFVMTFASWSWlTLLLVGLFSSvLUSlvEKLTTDVS 86 

RH ES K+L RN WMTFAS+++VT+TL+LVG+F ++LN+ + T+ I +++ 

Sbjct: 7 RHLRES FKSLGRNTWMTFAS I SAVTVTTjILVGWLVIMLNLNNMATNAEKQVE I KVLIDL 66 

Query: 87 DSTDAQKQVKDKDGKLKDNPDYHKVYDKIKRISGVEKVTYSSKAEQLKEVQKEYGSDVID 146 

+ D K +D K+ + IK + G++ VT+SSK ++L ++ +G 

Sbjct: 67 TA DQKAQD KLQNDIKELKGIQSVTFSSKEKELDQLVDSFGDSGKS 111 

Query: 147 DTYKDA LLDVYWGTSSAKVSKSVSEAIGRIEGV DYTKEPIDST-KLSNLTDNI 199 

T KD L D +W T+ + +V++ I +++ V Y KE + K+ ++ NI 
Sbjct: 112 LTMKDQFJSTPIMDAFWKTTDPHDTPNVAKKIEKMDHVYKVTYGKEEVSRLFKWGVSR^ 171 

Query: 200 RIWGFGGVALLIVL— -AIFLISNTIRMSIMSRRTDIEIMRLVGAKNSYIRGPFFFEGAW 256 
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Query: 257 VGILGAIVPSLIFYFGYQFVFNKFNPKFETSHV8LYPMDIMVPAIIGGMVIIGIIIGSLG 316 

+G+ G+++P + YQ+V PK + S VSL P + V + ++ IG +IG G 

Sbjct: 226 LGVFGSVIPIALVLSTYQYVIGWWPKVQGSFVSLLPYNPFVFQVSLVLIAIGAVIGVWG 285 

Query:- 317 SVLSMHRYLKI 327 

S+ S+R++L+4- 
Sbjct: 2B6 SLTSIRKFLRV 296 

A related DNA sequence was identified in S.pyogenes <SEQ ID 550 1> which encodes the amino acid 
sequence <SEQ ID 5502>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -7.7 0 Transmembrane 195 - 211 ( 189 - 219) 

INTEGRAL Likelihood = -6.74 Transmembrane 39 - 55 ( 30 - 58) 

INTEGRAL Likelihood = -5.52 Transmembrane 294-310(288-314) 

INTEGRAL Likelihood = -1.49 Transmembrane 246 - 262 ( 245 - 263) 

Final Results 

bacterial membrane Certainty=0 .4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC67264 GB:AF017113 cell division protein [Bacillus subtilis] 
Identities = 117/311 (37%), Positives = 184/311 (58%), Gaps = 19/311 (6%) 
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M DQ+ NPL D ++++T P 



L+F A+FLISNTI ++TI +R+++IEIM+LVGA N +IR PFF EG 



GS+ S+R++L+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/318 (54%) , Positives = 238/318 (74%) , Gaps = 5/318 (1%) 

Query: 13 MKRRENMVIMIN-FFRHFWHSLKNLKPJIFIflOTFASVTSW 71 

MK++E MV MI FFRH WES+KKLKRNFWMTFASV+ V +TL LVG+F++ LLN++++ 
Sbjct: 2 MKKKEIIWTMIRYFFRHIlTO3IKKXK™FimTFASVSM\ r AVTLTLVGVFAATLLNIQRVA 61 

■ Query: 72 TDVSGNFTI SAFLiNVDSTDAQKQVKDKKSKLKDNPD 131 
+ V N 1+ +L VDSTDA K +++ G+ +N +YH VYDKI +1 GV+K+T+SSK E 
Sbjct: 62 SGVENNraiNTYLQVDSTDAAKVIQN^ 121 
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' Query: 132 QLKEVQKEYGSDVID--DTYKEALLDVYWGTSSAKVSKSVSEAIGRIEGVDYTKEP-ID 188 
QLK++Q+ G DV + D + h D+Y++ T + K K++++ I IEGV+ 1+ 
Sbjct: 122 QLKKLQETLG-DVWMYDQDTNPLQDIYLIETQTPKQVKA.ITKKIRTIEGVEAADYGGIN 180 

Query: 189 STKLSNLTDNIRIWGFGGVALLIVLAIFLISNTIRMSIMSRRTDIEIMRLVGAKNSYIRG 248 

S KL + 1+ WG G A+L+ +A+FLISNTIRM+IMSR+ DIEIMRLVGAKNSYIRG 
Sbjct: 181 SDKLFKFSTLIQTWGLIGTAMLLFVAVFLISNTIRMTIMSRKRDIEIMRLVGAKNSYIRG 240 

Query: 249 PFFFEGAWVGILGAIVPSLIFYFGYQFVFNKraPKFETSHVSLYPMDIMVPAIIGGMVII 308 

PFFFEGAWVG+LGA++PSL+ Y+GY V+ F + + +++S+YP+D V +IG + +1 
Sbjct: 241 PFFFEGAWVGLLGAVLPSLLIYYGYDLOTKHFAQELQRNNLSMYPLDPYVYYLIGALFVI 300 

Query: 3 09 GI I IGSLGSVLSMRRYLK 326 

GI+IGSLGSVLSMRRYLK 
Sbjct: 301 GIMIGSLGSVLSMRRYIiK 318 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1770 

A DNA sequence (GBSxl877) was identified in S.agalactiae <SEQ ID 5503> which encodes the amino 
acid sequence <SEQ ID 5504>. This protein is predicted to be carboxymethylenebutenolidase-related 
protein. Analysis of this protein sequence reveals the following: 

I-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF10898 GB:AE001979 carboxymethylenebutenolidase-related 
protein [Deinococcus radiodurans] 
Identities = 65/183 (35%), Positives = 98/183 (53%), Gaps = 3/183 (1%) 

Query: 56 SKGKYKANIIFYQGAIjVEEEAYSQLARDIiADKGDNTYILKTPI^PVLSPHKAKTIINQN 115 

+ +VK ++FY G V +AY L R LA +G T I PL+L + +A+ +1 + 
Sbjct: 100 ASAEVKTLLVFYPGGRVRPQAYEWLGRALAVRGVQTVIPAFPLDLAITGTERAEGLIARY 159 

Query: 116 HL-TNVYLAGHSLGGWASQNAKVAP--WGLILLASYPSRKSDLSHKNLRVLSITASND 172 

V LAGHSLGG VA+Q A + P + GL+LLA+YP+ +L LS+ A D 

Sbjct: 160 GAGKEVVLAGHSLGGTVAAQYAALRPDKIDGLLLLAAYPAPNVNLHDARFPALSLLAEKD 219 

Query: 173 HILNWEKYEEAKKRLPNSSTFRTIVGGNHSRFGNYGHQKGDGKATLSHKSSEKQLATFIS 232 

+ + +RLP ++ + G HS FG YG Q+GDG T+S +E+++ + 

Sbjct: 220 GVADAGLWGGLERLPKOTRLTVLPGAVHSFFGRYGPQQGDGVPTVSRARAEREIVQAVE 279 

Query: 233 NFI 235 
FI 

Sbjct: 280 TFI 282 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 5504 (GBS158) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 4; MW 27kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 5; MW 52kDa). 



WO 02/34771 



PCT/GB01/04789 



-1999- 

The GBS158-GST fusion product was purified (Figure 113; see also Figure 201, lane 4) and used to 
immunise mice (lane 1+2 product; 14.5ug/mouse). The resulting antiserum was used for Western blot, 
FACS, and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1771 

A DNA sequence (GBSxl878) was identified in S.agalaciiae <SEQ ID 5505> which encodes the amino 
acid sequence <SEQ ID 5506>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 0281 (Affirmative) < succ 

bacterial membrane --- Certainty=0 . 0000 (Hot Clear) < suco 

bacterial outside --- Certainty=0 . C000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06539 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
= 83/197 (42%) , Positives = 114/197 (57%) , Gaps = 4/197 (2%) 



1+ DPG +++I ++ + +AILLTH H+DHI +++ VR+TF ^ 



PVY+ E E WL P N S 



D++V +GDALF 



A related DNA sequence was identified in S. pyogenes <SEQ ID 5507> which encodes the amino acid 
sequence <SEQ ID 5508>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0407 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/231 (93%) , Positives = 224/231 (96%) 

Query: 1 MPFIFRHSFFNKVLIFWYTIIMKIYKTINHIAGENTYYLVNDQAVILIDPGSNGQEIIAK 60 

+PFIFR+SFFNKVLIFWYTI+MKIYKTINHIAGENTYYLVNDQAVILIDPGSNGQEIIAK 
Sbjct: 1 LPFI FRYSFFNKVLI FWYTILMKIYKXINHIAGENTYYLVMDQAVILIDPGSNGQEI IAK 60 
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214 


Sbjct: 


191 



55 Query: 61 IKSFEKPLVAILLTHTHYDHIFSLDLVRDTFDNPPVYVSEKEAAWLSSPDDNLSGLGRHD 120 

IKSFEKPLVAILLTHTHYDHIFSLDLVRD FD+PPVYVSEKEAAWLSSPDDNLSGLGRHD 
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Sbjct: .61 IKSFEKPLVAILLTHTHYDHIFSLDLVRDAFDHPPVYVSEKEAAWLSSPDDNLSGLGRHD 120 

Query: 121 DIINVIARPAENFPJQjKQPYQLNGFEF'TVLPTPGHSWCSGV'SFVFHSDELWTGDALFRET 180 

DII VIARPAENFFKLKQPYQLNGFEFTVLPT GHSWGGVSFVFHSDELWTGDALFRET 
Sbjct: 121 DIITVIARPAENFFKLKQPYQDNGFEFTVLPTSGHSWGGVSFVFHSDELWTGDflLFRET 180 

Query: 181 IGRTDLPTSNFEDLITGIRQELFTLPSHYSVHPGHGMNTTIGHEKNFNPFF 231 

IGRTDLPTSNFEDLITGIRQELFTLP+HY V+PGHG +TTI HEKN NPFF 
Sbjct: 181 IGRTDLPTSNFEDLITGIRQELFTLPNHYRVYPGHGPSTTICHEKNANPFF 231 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1772 

A DNA sequence (GBSxl879) was identified in S.agalactiae <SEQ ID 5509> which encodes the amino 
acid sequence <SEQ ID 5510>. This protein is predicted to be acetoin reductase (fabG). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1596 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 963 1> which encodes amino acid sequence <SEQ ID 9632> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC48769 GB:U71200 acetoin reductase [Bos taurus] 
Identities = 162/254 (63%), Positives = 188/254 (73%), Gaps = 2/254 (0%) 



- F++ G+GGKI INATSQAG GNPNL++Y TKFAVR +T A+DLA + ITVNAYAP 



GIVKTP FDIAHEVGKNAGKDDEWGM+ FAKDI LKRLSEPEDVA AV FLAG DSNYI 



TGQTI VDGGM FH 
TGQTIEVDGGMQFH 257 

A related DNA sequence was identified in S.pyogenes <SEQ ID 551 1> which encodes the amino acid 
sequence <SEQ ID 5512>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 1131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not -Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 209/213 (98%) , Positives = 212/213 (99%) 

Query: 1 MTKEYEVEDMSKVAIVTGAGQGIGFAIAKRLHADGFKIGVLDYNEETAQAAVDKLSPEDA SO 

+TK+YEVEDMSKVAIVTGAGQGIGFAIAI<RLHADGFKIG+LDYNEETAQAAVDKLSPEDA 
Sbjct: 1 LTKK3EVEDMSKVAIWGAGQGIGFAIAKRLHADGFKIGILDYNEETAQAAVDKLSPEDA 60 

Query: SI VAWADVSKRDQVFDAFQKWDTFGDLNVWNNAGVAPTTPLDTITEEQFEKAFAINVGG 120 

VAWADVSKRDQVFDAFQKVVDTFGDLNWVNNAGVAPTTPLDTITEEQFEKAFAINVGG 
Sbjct: 61 VAWADVSKRDQVFDAFQKOTDTFGDLKVWNNAGVAPTTPLDTITEEQFEKAFAINVGG 120 

Query: 121 TIWGSQAAQKHFRELGHGGKIINATSQAGCEGNPKLTVYGGTKFAVRGITQTLAKDLASE 180 

TIWGSQAAQKHFRELGHGGKIINATSQAGCEGNPNLTVYGGTKFAWGITQTLAKDLASE 
Sbjct: 121 TIWGSQAAQKHFRELGHGGKIINATSQAGCEGNPKLTVYGGTKFAVRGITQTLAKDLASE 180 

Query: 181 GITVNAYAPGIVKTPMMFDIAHEVGKNAGKDDE 213 

GITVNAYAPGI VKTPMMF IAHEVGKNAGKDDE 
Sbjct: 181 GITVNAYAPGIVKTPMMFAIAHEVGKNAGKDDE 213 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1773 

A DNA sequence (GBSxl880) was identified in S.agalactiae <SEQ ID 5513> which encodes the amino 
acid sequence <SEQ ID 5514>. This protein is predicted to be ATP-dependent DNA helicase. Analysis of 
this protein sequence reveals the following: 

l 

5 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3735 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB38451 GB:L47709 22.4% identity with Escherichia coli 

DNA-damage inducible protein . . . ; putative [Bacillus subtilis] 
Identities = 132/461 (28%), Positives = 231/461 (49%), Gaps = 22/461 (4%) 



+ + F VA ++QL++ FVAHN+ FD + r-L G +L + DTVELS 

QMVENEQPFEAVAEEVFQLLDGAYFVAHNIHFDLGFVKYELHKAGFQLPDCEVLDTVELS 123 

QVFYPCLEKYSLGAIiAESLNIELTDAHTAIADARATAQLFIKLKAKISSLPKEVLETILT 197 
++ +P E Y L L+E L + HA +DA T +F+++ K+ LP L+ + 

RIVFPGFEGYKLTELSEEIjQLRHDQPHRADSDAEVTGLIFIjEILEKLRQIjPYPTLKQLRR 183 

FADNLLFESYLLIEEAYQEADFVNPKETYYFWCGLVLKKEKAVGKPKKLSSDFQ 250 

+ + + + L++ E Y + +++ +A+ +F 

LSQHFISDLTHLLDMFINENRHTEIP3YTRFSSFS\/REPEAIDVRINEDENFSFEIESWE 243 
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Query: 303 SQKQQI I VSVPTKI LQDQIMAKEI KHIQELFHI PCHS - - 1 KGPRNYLKLDAFYKSLQ VQD 3S0 

K+ +I+S + +LQ QI+ K++ +Q+LF P + +KG +YL L F + L +D 
Sbjct: 304 KSKKPVIISTYSTLLQQQILTKDLPIVQDLFPFPVTAAILKCSQSHyLCLYKFEQVIiffiED 363 

Query: 361 RNRLINRFKMQLLVWLTETTTGDLDEIKQKQRLESYFDQLKHDGE-VTQSSLFYDLDFWK 419 

N K QLLVWLTET TGD+ E+ + +D+L +D + +S + + F++ 

Sbjct: 364 DlSnfDAVLTKAQLLWLTETOTGDVJ^LNtjPSGGKLLWDRIAYDDDSYKRSRSEHVIGFYE 423 

Query: 420 RSYDKVAQSQLVIINHAYFL-ERVQDDKDFAKGKVLVFDEA 459 

R+ +S LVI NH+ L + K + + DEA 

Sbjct: 424 RAKQIAMRSDLVITNHSLLLTDEGSHKKRLPESGTFIIDEA 464 
Identities = 63/195 (32%) , Positives = 88/195 (44%) , Gaps = 16/195 (8%) 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5515> which encodes the amino acid 
sequence <SEQ ID 5516>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3735 (Affirmative) < suco 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 500/835 (59%), Positives = 626/835 (74%), Gaps = 2/835 (0%) 

mfcfidiacynrltmtqkklrkyavvdleatgagpnasiiqvgiviiqgnkiidsyetdv 60 
mfcfidiacynrltmtqkklrkyawdleatgagpnasiiqvgiviiqgnk1idsyetdv 
mfcfidiacynrltmtqkklrkyavvdleatgagpnasiiqvgiviiqgnkiidsyetdv 60 

nphesldehivhltgitdkqlakapdfgqvahhiyqliebcif\fahnvkfdanllaeqlf 120 
nphesldehivhltgitdkqlakapdfgqvahhiyqliedcifvahnvkfdanllae lf 
nphesldehivhi,tgitdkqijucapdfgqvahh:yqliedcif\/ahnvkfdanliaealf 120 

legcelrtpridtwlsqvfypclekyslgalaeslnieltdahtaiadarataqlfikl 180 
leg el pr+dtvel+q+f+p eky+l l+ lni+l +ahtaiadarata lf++l 



KI SLP E LE++L ++D+LLFE+ ++I+E +A +P +Y + ++L K 



KP ++S F +NMALLG++ RPKQ FA L+ ++ +F+EAQ G+GKTYGYLLPLL 
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Query: 301 DQSQKQQIIVSVPTKILQDQIMAKEIKHIQELFHIPCHSIKGPRNYLKLDAFYKSLQVQD 360 
65 + + QIIVSVPTK+LQDQ+MA E+ IQE FHI CHS+KGP NYLKLD+F SL D 
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Sbjct: 301 AKEDQNQIIVSVPTKLLQDQLMAGEVAAIQEQFHIACHSLKGPAlIYLKIiDSFADSLDQND 360 

Query: 351 RNELINRFKMQLLWLTETTTGDLDEIKQKQRLESYFDQLKHDGEVTQSSLFYDLDFWKR 420 
+NRL+NR+KMQLLVWL ET TGDLDEIKQKQR +YF+QLKHDG++ QSS FYD DFW+ 
5 Sbjct: 361 QNRLWRYKMQLLVWLLETKTGDLDEIKQKQRFAAYFEQLKHDGDIKQSSEFYDYDFWRV 420 

Query: 421 SYDIWAQSQLVIINHAYFLERVQDDKDFAKGKVLVFDEAQKLVLGLENFSRGQLDISHQL 480 

SY+K ++L+I NHAYFL RVQDDKDFA+ KVLVFDEAQKL+L L+ SR QL+++ Ii 
Sbjct: 421 SYEKAKTARLLira^YFLHRVQDDKDFARNJCVLVFDEAQKLMLQLDQLSRHQLNLTVFL 480 

10 

Query: 481 QVIQKIIDSSIPLLQKRLLESISYELSHAVEIiFYRHNSFEFSETWLKRLKNSINALEWG 540 

Q IQ + + +PLL+KRLLES+S+EL +Y++ + + W R+ L 

Sbjct: 481 QTIQAKLSNPLPLLEKRLLESLSFELGQVSSDYYQNKEHQLAHDW-SRIAGYAKELTGAD 539 

15 Query: 541 LDELQTFFTATYTNYWFETDKVNEKRLTILRGAREDFLKFSKFLPPTKKTYMISATLQIS 600 

ELQ FF + +YW 4-+K EKR+T L A + F+ F 4 LP T KTY +SATL IS 
Sbjct: 540 YQELQAFFATSDGDYWLSSEKQEEKRVTYLNSASKAFIHFQQLLPETVKTYFVSATLTIS 599 

Query: SOI PKVYLSDLLGGFSSISTEKIAHEKNANQKVWIDTSMPNILDLSPEQYAYEIAKRLQDIMT 660 
20 +V L+DLL GF I +K +Q V +D P + ++S + Y IAKR++ + 

Sbjct: 600 SEVTLADLL-GFEEYLYHVIEKDKKQDQLVLVDQSAPIVTEVSDQIYVEAIAKRIESLKQ 658 

Query: 661 LKQPTLVLLTSKQTMFMVSDYLDKWE I KHLTQDKNGLAYNVKKRFDRGESNLLLGTGS FW 720 
P LVL SK+ + +VSDYLD+W++ HL Q+KNG AYN+ KKRFD+GE +LLG GSFW 
25 Sbjct: 659 EGYP ILVLFNSKKHLLLVSDYLDQWQVPHLAQEKNGTAYNI KKRFDQGEQT I LLGLGS FW 718 

Query: 721 EGVDFVHRDRLIEVITRLPFDTPKDYFIQKLSQSLTKEGKNFFYDYSLPMTVLKLKQALG 780 

EGVDF+ DR+I +1 RLPFD P+D+F++K+S L ++GKN F DY LPMT+L+LKQA+G 
Sbjct: 719 EGVDFIQADRMITLIARLPFDNPEDFFVKKMSHYLLEKGKNPFRDYFLPMTILRLKQAIG 778 

30 

Query: 781 RTTRREEQKSAVIILDSRLVIKSYGQTIMHSLGRDFEISKEKINKVLTEMAKFLI 835 

RT RR++QKS VIILD RL+ KSYGQ 1+ LG++F IS++ + L E FLI 
Sbjct: 779 RTMRRQDQKSWIILDRRLLTKSYGQVILEGLGQEFLISQQNFHDCLVETDCFLI 833 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1774 

A DNA sequence (GBSxl881) was identified in S.agalactiae <SEQ ID 5517> which encodes the amino 
acid sequence <SEQ ID 5518>. Analysis of this protein sequence reveals the following: 

40 Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2042 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

A related GBS nucleic acid sequence <SEQ ID 9633> which encodes amino acid sequence <SEQ ID 9634> 
was also identified. 



50 The protein has homology with the following sequences in the GENPEPT 

>GP:AAF12702 GB:AF035157 aspartate aminotransferase [Lactococcus 
lactis] 

Identities = 270/391 (69%) , Positives = 314/391 (80%) 

55 Query: 7 MTYLSERVIiNMEESVTLAAGAKARELRVQGRDILSLTLGEPDFATPKNIQQAAIEAITDG 66 

M S+ VL M+ESVTLAA +A+ L+ QGRDI+ LTLG+PDF TPK I QAAIEAI 4G 
Sbjct: 1 MKKCSDFVLKMDESVTLAAANRAKALKAC^RDIIDLTLGQPDFPTPKKIGQAAIE!AINNG 60 



60 



Query: 67 RASFYTPSSGLPELKSAINAYFERFYGYSLKPNQWVGTGAKFILYTFFMTVLNPGDEVI 126 
+ASFYT + GLPELK A+ Y+ RFY Y ++ N++++ GAKF LY +FM ++P DEVI 
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Sbjct: SI QASFYTQAGGLPELKKAVQHYWTRFYAYEIQTNZILITAGAKFALYAYFMATVDPLDEVI 120 

Query: 127 IPTPYWVSYADQIKMAEGKPVFVTAKEVlffiFKVTVEQLEKTOTDKTKVILLNSPSNPTGM 186 

IP PYWVSY DQ+KMA G PV V AK+ N+FKVTVEQLE RT KTK+ +LLNS PSNPTGM 
Sbjct: 121 IPAPYWSYVDQVKMAGGNPVIVFAKQENNFK'i/TVEQLEKARTSKTKILLLNSPSNPTGM 180 

Query: 187 IYKAEELEAIGNWAVEHDILILADDIYGRLVYNGNIFTPISSLSESIRNQTIVINGVSKT 246 

IY EEL AIG WAV HD+LILADDIY RLVYNG FT ISSLS+ IRN+T VIWGVSKT 
Sbjct: 181 IYSKEELTAIGEWAVAHDLLILADDIYHRLVYNGAEFTAISSLSDEIRNRTTVINGVSKT 240 



Query: 307 RLNIIYPLLCQVPGFEVVKPQGAFYLFPJWTKAMEMKGYTDVTAFTDAILEEVGLALVTG 366 

RLN IY L +VPGFE+VKP GAFYLFP VTKAM MKGYTDVT FT AILEE G+ALVTG 
Sbjct: 301 RliNKIYLQLSEVPGFELWPNGAFYLFPKVTKAMAMKGYTDVTDFTTAILEEAGVALVTG 360 

Query: 367 AGFGAPENVRLSYATDLETLKEAVRRLHVFM 397 

AGFG+PENVRLSYAT LETL+ AV RL +M 
Sbjct: 361 AGFGSPENVRLSYATSLETLEAAVTRLKDWM 3 91 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1005> which encodes the amino acid 
sequence <SEQ ID 1006>. Analysis of this protein sequence reveals the following: 

Possible site: 3 0 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.48 Transmembrane 95 - 111 ( 95 - 113) 

Final Results 

bacterial membrane Certainty=0. 11 92 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 301/397 (75%), Positives - 343/397 (85%) 

Query: 7 MTYLSERVLNMEESVTLAAGAKARELRVQGRDILSLTLGEPDFATPKNIQQAAIEAITDG 66 

M LS+RVL M+ESVTLAAGA+A+ L+ QGRD+L+LTLGEPDF TPK+IQ AIE+I +G 
Sbjct: 1 MPKLSKRVLEMtCESOTLAAGARAKALKAQGRDVLNLTLGEPDFFTPKHIQDKAIESIQNG 60 

Query: 67 RASFYTPSSGLPELKSAINAYFERFYGYSLKPNQWVGTGAKFILYTFFMTVLNPGDEVI 126 

ASFYT +SGLPELK+AI Y + YGY L P+Q+V GTGAKFILY FFM VLNPGD+V+ 
Sbjct: 61 TASFYTNASGLPELKAAIATYLKNQYGYHLSPDQIVAGTGAKFILYAFFMAVLNPGDQVL 120 

Query: 127 IPTPYWSYADQIKMAEGKPVFVTAKEVNHFK'/TVEQLEAWTDICrKVIIjLNSPSNPTGM 186 

IPTPYWVSY+DQ+KMAEG+P+FV E N FKVTV+QLE RT KTKV+L+NSPSNPTGM 
Sbjct: 121 IPTPYWSYSDQVKMAEGQPIFVQGLEENQFKVTVDQLERARTSKTKVVLINSPSNPTGM 180 

Query: 187 IYKAEELEAIGNWAVEHDILILADDIYGRLVYNGNIFTPISSLSESIRNQTIVINGVSKT 246 

IY AEEL AIG WAV +DILILADDIYG LVYNGN F PIS+LSE+IR QTI +NGV+K+ 
Sbjct: 181 IYGAEELRAIGEWAVHITOILIIjyDDIYGSLvYNGNQFVPISTLSEAIRRQTITVNGVAKS 240 

Query: 247 YAMTGWRVGFAVGNHDIIAAMSKWSQTTSNLTAVSQYATIEALNGSQESFEKMRLAFEE 306 

YAMTGWRVGFA G +II+AMSK++ QTTSNLT VSQYA IEA GSQ S E+MRLAFEE 
Sbjct: 241 YAMTGWRVGFAAGEPEIISAMSKIIGQTTSNLTTVSQYAAIEAFCGSQSSLEEMRLAFEE 300 

Query: 307 RIjNIIYPLLCQVPGFEVVKPQGAFYLFPNVTKAMEMKGYTDVTAFTDAIIjEEVGLALVTG 366 

RLNI YPLLCQVPGFEWKPQGAFY FPNV KAMEM G++DVT+F +AILEEVGLA+V+G 
Sbjct: 301 RliNITYPLLCQVPGFEVVKPCOAFYFFPNVKKAMEMTGFSDVTSFANAILEEVGLAVVSG 360 



WO 02/34771 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1775 

A DNA sequence (GBSxl882) was identified in S.agalactiae <SEQ ID 5519> which encodes the amino 
acid sequence <SEQ ID 5520>. This protein is predicted to be asparaginyl-tRNA synthetase (asnS). 
Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .1488 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 



Sbjct 
Query: 



S IVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIBKYGEESGLE 6 6 
+ 1 + YV QEVT+GAW+ANK GKIAF+QLRDG+ F QGV K E G E 

TIAKIGQYVDQEVTLGAWLANKRSSGKIAFLQLRDGTGFIQGVWKA EVGDE 55 



: 127 NRHLWLRSRKQMAVMQIRNAIIYSTYEFFDQNGFIKFDSPILSENAAEDSTELFETDYFG 186 
+RHLW+RSRKQ AV++IRN II +TYEFF +NGF+K D PIL+ +A E +TEBF T YF 
Sbjct: 116 HRHLWIRSRKQHAVLRIRNEIIRATYEFFHENGFVKVDPPILTGSAPEGTTELFHTKYFD 175 

Query: 187 KPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFLSHEESL 246 

+ AFLSQSGQLY+EA A+A GRVF FGP FRAEKSKTRRHL EFWM++ E +F+ EESL 
Sbjct: 176 EDAFLSQSGQLYMEAAALAFGRVFSFGPTFRaEKSKTRRHLIEFWMIEPEMAFVEFEESL 235 

Query: 247 DLQEAYVKALIQGVLDRAPQALDILERDVEALKRYIAEPFKRVSYDDAITLLQEHEADED 306 

++QE YV ++Q VL L L RD L+ I PF R+SYDDAI L E D+ 

Sbjct: 236 EIQENYVAYIVQSVLKHCAIELKTLGRDTSVLES-IQAPFPRISYDDAIKFLHEKGFDD- 293 

Query: 307 TDYEHLEHGDDFGSPHETWISISrYFGVPTFWNYPASFKAFYMKPVPGNPERVl,CADIjliAP 366 

+E GDDFG+PHET 1+ +F P F+ +YP S K FYM+P P + VLCADL+AP 
Sbjct: 294 IEWGDDFGAPHETAIAEHFDKPVFITHYPTSLKPFYMEPDPNRDDWLCADLIAP 348 

Query: 367 EGYGEIIGGSMREDDYDALVAKMDELGriDKSEYDFYLDLRKYGSVPHGGFGIGIERMVTF 426 

EGYGEIIGGS R DYD L +++E + Y +YLDLRKYGSVPH GFG+G+ER V + 
Sbjct: 349 EGYGEIIGGSQRISDYDLLKKRLEEHDLSLDAYAWYLDIjRKYGSVPHSGFGLGIjERTVGW 408 

Query: 427 VAGTKHIREAI PFPRMLHRIKP 448 

++G H+RE IPFPR+L+R+ P 
Sbjct: 409 I SGAGHVRETI PFPRLLNRLYP 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5521> which encodes the amino acid 
sequence <SEQ ID 5522>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1488 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 443/448 (98%) , Positives = 447/448 (98%) 

Query: 1 MSKKLISIVDVKDYVGQEVTIGAWVAKKSGK6KIAFVQLRDGSAFFQGVAFKPNFIEKYG 60 

MSKKLISIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKYG 
Sbjct: 1 MSKKLISIVDVKDYVGQEVTIGAWVANKSGKGKIAFVQLRDGSAFFQGVAFKPNFIEKYG 6 0 

Query: 61 EESGLEKFDVIKRLNQETSVYVTGIVKEDERSKFGYELDITDLEVIGESHEYPITPKEHG 120 

Sbjct: 61 EESGLEKFDVIKRLNQETS\mTGIVKEDERSKFGYELDITDLEIIGESHEYPITPKEHG 120 

Query: 121 TDFLMDNRHLWLRSRKQMAVMQIRNAIIYSTYEFFDQNGFIKFDSPILSENAAEDSTELF 180 

TDFLMDNRHLWLRSRKQMAVMQIRNAI IY+TYEFFDQNGFIKFDSPILSENAAEDSTELF 
Sbjct: 121 TDFLMDNRHLWLRSRKQMAVMQIRNAI I YATYEFFDQNGFIKFDSPILSENAAEDSTELF 180 

Query: 181 ETDYFGKPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 240 

ETDYFGKPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 
Sbjct: 181 ETDYFGKPAFLSQSGQLYLEAGAMALGRVFDFGPVFRAEKSKTRRHLTEFWMMDAEYSFL 240 

Query: 241 SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYIAEPFKRVSYDDAITLLQE 300 

SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYI EPFKRVSYDDAITLLQE 
Sbjct: 241 SHEESLDLQEAYVKALIQGVLDRAPQALDILERDVEALKRYITEPFKRVSYDDAITLLQE 300 

Query: 301 HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFVVNYPASFKAFYMKPVPGNPERVLC 360 

HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLC 
Sbjct: 3 01 HEADEDTDYEHLEHGDDFGSPHETWISNYFGVPTFWNYPASFKAFYMKPVPGNPERVLC 3 60 

Query: 361 ADLLAPEGYGEIIGGSMREDDYDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGI 420 

ADLLAPBGYGEIIGGSMRED+YDALVAKMDELGMDKSEYDFYLDLRKYGSVPHGGFGIGI 
Sbjct: 3 61 ADLLAPEGYGEIIGGSMREDNYDALVAKMDELGMDKSEYDFYLDIJIKYGSVPHGGFGIGI 420 

Query: 421 ERMVTFVAGTKIIIREAIPFPRMLPiRIKP 448 

ERMVTFVAGTKHIREAI PFPRMLHRI+P 
Sbjct: 421 ERMVTFVAGTKHIREAIPFPRMLHRIRP 448 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1776 

A DNA sequence (GBSxl883) was identified in S.agalactiae <SEQ ID 5523> which encodes the amino 
acid sequence <SEQ ID 5524>. Analysis of this protein sequence reveals the following: 
Possible site: 17 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.85 Transmembrane 103 - 119 ( 102 - 127) 
INTEGRAL Likelihood = -5.04 Transmembrane 73 - 89 ( 68 - 93) 
INTEGRAL Likelihood = -4.19 Transmembrane 31 - 47 ( 31 - 49) 
INTEGRAL Likelihood = -1.86 Transmembrane 157 - 173 ( 157 - 173) 

Final Results 

bacterial membrane Certainty=0. 3739 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD40355 GB:AF036485 hypothetical protein [Plasmid pNZ4000] 
Identities = 39/135 (28%) , Positives = 72/135 (52%) , Gaps = 4/135 (2%) 

Query: 3 KSPARLISFISIAIAINLVGANLAI^LRLPIYLDTIGTLLIAVILGPWYAASTAFLSALI 62 

K A ++ I A+ IN V LA L+LP++L ++GT L +++ GP A + F++ +1 
Sbjct: 15 KLSAATMTLIPAAVGINWAKALAEGLKLPVWLGSLGTFLASMLAGPVAGAISGFINNVI 74 

Query: 63 NWMTTDIFSLYYSPVAIWAIITGILIKRNCKPSS--LLWKSLIISLPGTIIASVITVIL 120 
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+T S Y+ +1 + I G+L S+ + +I++ + VI 

Sbjct: 75 YGLTLSPISTVYAITSIGIGIAVGV1.HANGWFSSARRVFVSAIIIAIVSAVISTPLNVIF 134 

Query: 121 FKGIT- -SSGSSIIA 133 

+ G T + G S+ A 
Sbjct: 135 WGGQTGIAWGDSLFA 149 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1777 

A DNA sequence (GBSxl884) was identified in S.agalactiae <SEQ ID 5525> which encodes the amino 
acid sequence <SEQ ID 5526>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1873 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75223 GB:AE000305 orf, hypothetical protein [Escherichia coli K12] 
Identities = 97/305 (31%), Positives = 160/305 (51%), Gaps = 10/305 (3%) 

Query: 1 ^KEKIIIDCDPGIDDTIJUjOTAIQHPKLEWAITITAGNSPVEI^LKNTFVTLELLNRH 60 

M K KII+DCDPG DD +A+M A +HP ++++ ITI AGN ++ L N + h 

Sbjct: 1 ^KRKIILDTOPGHDrAIAIMNftA^ 59 

Query: 61 DIPVYVGDNLPLQREFVSAQDTHGMDGLGENWFTLAQPIIFQEESADC---FLANYFEHK 117 

++PVY G P+ R+ + A + HG GL F +P+ Q ES . + 

Sbjct: 60 NVPVYAGMPQPIMRQQIVADNIHGETGLDGPVF EPLTRQAESTHAVKYI IDTLMASD 116 

Query: 118 NDTSIIAl^PLTNIARALQTNPKLGKHCKRFISMGGSFKSHGNCSPVAEYNYWCDPHAAQ 177 

D +++ +GPL+NIA A++ P + + + MGG++ + GN +P AE+N + DP AA+ 
Sbjct: 117 GDITLVPVGPLSNIAVAMRMQPAILPKIREIVLMGGAYGT-GNFTPSAEFNIFADPEAAR 175 

Query: 178 YVFENLDKKIEMVGLDITRHIVLTPNHLSYMERINPDVSSFIQKITKFYFDFHWQYEHII 237 

VF + + M+GLD+T V TP+ ++ MER IF ++ + 

Sbjct: 176 WFTS-GVPLVMMGLDLTNQTVCTPDVIARMERAGGPAGELFSDIMNFTLKTQFENYGLA 234 

Query: 238 GCTINDPLAIAYFVNENIATGFDSYTDVACH-GIAKGQTIVDQYHFYKKDANSKILTSVN 296 

G ++D IY+N+ +Y+V+G G+T+ D+ K AN+K+ +++ 

Sbjct: 235 GGPVHDATCIGYLINPDGIKTQE^nrVEVDVNSGPCYGRTVCDELGVLGKPAlJTKVGITID 294 

Query: 297 TNLFW 301 

Sbjct: 295 TDWFW 299 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1778 

A DNA sequence (GBSxl885) was identified in S.agalactiae <SEQ ID 5527> which encodes the amino 
acid sequence <SEQ ID 5528>. Analysis of this protein sequence reveals the following: 
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N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1860 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB62728 GB:AL133423 hypothetical protein SC4A7.24C 
[Streptomyces coelicolor A3 (2)1 
Identities = 36/134 (26%) , Positives = 57/134 (41%) , Gaps = 7/134 (5%) 

Query: 1 MLYEVTSSOTQGVDGKVYLSNGKIVETNHPLNHL PGFNPEELIALAWSTCIiNATIK 56 

+LY 4+ G DG+V +G++ +P + G NPE+L A +S C + 

Sbjct: 8 VLYTAVATAENGRDGRVATDDGRLDWVNPPKEMGGNGAGTNPEQLFAAGYSACFQGALG 67 

Query: 57 AILEQKGFKDljKSRVDVTCQr.MKEKQVGKGFYFQVNAVASIEKLSLSDSKLIVNKAHSRC 116 

+ Q+G S V + K GF V AI + ++++V KAH C 

Sbjct: 68 WARQEGADI SGSTVTAKVGIGKNDD GFGI IVE I SAE I PTVDAATARSL VEKAHQVC 124 

Query: 117 PISKL1SNAKTINL 130 

P SK T+ L 

Sbjct: 125 PYSKATRGNITVTL 138 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1779 

A DNA sequence (GBSxl886) was identified in S.agalactiae <SEQ ID 5529> which encodes the amino 
acid sequence <SEQ ID 5530>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .0531 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9635> which encodes amino acid sequence <SEQ ID 9636> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15482 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 164/285 (57%), Positives = 207/285 (72%), Gaps = 2/285 (0%) 

Query: 6 IKLVIvTGMSGAGKTVAIQSFEDLGYFTIDNMPPTDVPKFLELAAQSGDT-SKIAMVVDM 64 

I+LVI+TGMSGAGKTVAIQSFSDLGYF +DN+PP+Ii+PKFLEL +S SK+A+V+D+ 
Sbjct: 9 IQLVIITGMSGAGKTVAIQSFEDLGYFCM3NLPPSLLPKFLELMKESNSKMSKVALVMDL 68 

Query: 65 RSRLFFREINSILDSLEINDNINFKILFLDATDTELVSRYKETRRSHPLAATJGRVLDGIS 124 

RRFF+ LD+NI +II1FLDA D+ IjV+RYKETRRSHPLAA G L+GI+ 
Sbjct: 69 RGREFFDRLIEALDEMAENPWITPRILFLnAKDSILVTRYKETRRSHPLAATGLPLEGIA 128 

Query: 125 LERELIAPLKSMSQNVVDTSELTPRQBRKVISKEFSNQDSQSSFRIEVMSFGFKYGIPLD 184 

LERELL LK SQ + DTS++ PR LR+ I K F+ ++ F + VMSFGFKYGIP+D 
Sbjct: 129 LERELLEELKGRSQIIYDTSDMEO>RDLREKIVKHFAmQGET-FTVNVMSFGFKYGIPID 187 

Query: 185 ADLVFDVRFLPNPYYKPELRDKTGLDTEVYDYVMSFDESDDFYDHLLALIKPILPGYQNE 244 
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ADLVFDVRFLPNPYY +R TG D EV YVM ++E+ F + L+ L+ +LP Y+ E 
Sbjct: 188 ADLVFDVRFLPNPYYIESMRPLTGKDKEVSSYVMKWNETQKFNEKLIDLLSFMLPSYKRE 247 

Query: 245 GKSVLTVAIGCTGGQHRSTAFAHRLSEDLKADKTVNESHRDKNKR 289 
5 GKS + +AIGCTGGQHRS A L++ K D+ + +HRD KR 

Sbjct: 248 GKSQWIAIGCTGGQHRSVTLAENLADYFKKDYYTHVTHRDIEKR 292 

A related DNA sequence was identified in S.pyogenes <SEQ ID 553 1> which encodes the amino acid 
sequence <SEQ ID 5532>. Analysis of this protein sequence reveals the following: 

10 Possible site: 20 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15482 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
20 Identities = 164/231 (56%) , Positives = 213/291 (72%) , Gaps = 3/291 (1%) 



MSDKH-INLVIVTGMSGAGKTVAIQSFEDLGYFT1DMMPPALVPKFLELIEQTNENR-RV 58 
+S+ H I LVI4TGMSGAGKTVAIQSFEDLGYF +DN+PP+L+PKFLEL++++N +V 
VSESHDIQLVIITGMSGAGKTVAIQSFEDLGYFCVDNLPPSLLPKFLELMKESNSKMSKV 62 

: 59 ALWDMRSRLFFKE INSTLDS IESNPS I DFRI LFLDATDGEL VSRYKETRRSHPLAADGR 118 

ALV+D+R R FF + LD + HP I RILFLDA D LV+RYKETRRSHPLAA G 
: 63 ALVMDLRGREFFDRLIEALDEMAENPWITPRILFLDAKDSILVTRYKETRRSHPLAATGL 122 

119 VLDGIRLERELLSPLKSMSQHWDTTKLTPRQLRKTISDQFSEGSNQASFRIEVMSFGFK 178 
L+GI LERELL LK SQ + DT+ + PR LR+ I F+ + +F + VMSFGFK 
1 PLEGIALERELIiEELKGRSQI I YDTSDMKPRDLREKI VKHFATNQGE - TFTVNVMSFGFK 181 

: 179 YGLPLDADLVFDWFLPNPYYQVELREKTGLDEDVFIIYVMSHPESEVFYKHLLNLIVPIL 238 
YG+P+DADLVFDVRFLPNPYY +R TG D++V +YVM E++ F + L++L+ +B 
182 YGIPIDADLVFDWFLPNPYYIESMRPLTGKDKEVSSYVMKWNETQKFNEKLIDLLSFML 241 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/296 (79%) , Positives = 263/296 (88%) 

Query: 1 MSDEQIKLVIVTGMSGAGKTVAIQSFEDLGYFTIDNI-1PPTLVPKFLEIAAQSGDTSKIAM 60 

MSD+ I LVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPP LVPKFLEL Q+ + ++A+ 
Sbjct: 1 MSDKHINLVIVTGMSGAGKTVAIQSFEDLGYFTIDNMPPALVPKFLELIEQTNENRRVAL 60 

Query: 61 VVDMRSRLFFREINSILDSLEINDNINFKILFIiDATDTELVSRYKETRRSHPLAADGRVL 120 

WDMRSRLFF+EINS LDS+E N +I+F+ILFLnATD ELVSRYKETRRSHPLAADGRVL 
Sbjct: 61 WDMRSRLFFKEINSTLDSIESNPSIDFRILFLDATDGELVSRYKETRRSHPLAADGRVL 120 

Query: 121 DGISLERELLAPLKSMSQNWDTSELTPRQLRKVISKEFSNQDSQSSFRIEVMSFGFKYG 1B0 

DGI LERELL+PLKSMSQ+WDT++LTPRQLRK IS +FS +Q+SFRIEVMSFGFKYG 
Sbjct: 121 DGIRLERELLSPLKSMSQHWDTTKLTPRQLRKTISDQFSEGSNQASFRIEVMSFGFKYG 180 

Query: 181 IPLDADLVFDVRFLPNPYYKPELRDKTGLDTEVYDYVMSFDESDDFYDHLIALIKPILPG 240 

+PLDADLVFDVRFLPNPYY+ ELR+KTGLD +V++YVMS ES+ FY HLIi LI PILP 
Sbjct: 181 LPLDADLWDWFLPNPYYQVEDREKTGLDEDVFNT'/MSHPESEVFYKHLLNLIVPILPA 240 

Query: 241 YQNEGKSVLTVAIGCTGGQHRSTAFAKRLSEDLKAD/nTVNESHRDKNKRKETVNRS 296 

YQ EGKSVLTVAIGCTGGQHRS AFAH Ii+E L DW+VNESHRD+DJ+RKETVNRS 
Sbjct: 241 YQKEGKSVLWAIGCTGGQHRSVAFAHCLAESIATDWSVNESHPJ3QNRRKETVNRS 296 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1780 

A DNA sequence (GBSxl887) was identified in S.agalactiae <SEQ ID 5533> which encodes the amino 
acid sequence <SEQ ID 5534>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Final Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96620 GB:AJ400630 hypothetical protein [Streptococcus pneumoniae bacteriophage MM1] 
Identities = 254/321 (79%), Positives = 286/321 (88%), Gaps = 1/321 (0%) 

Query: 1 MRKPKITVIGGGTGIPVILKSLRLEDVEITAVVTVADDGGSSGELRSVMQ-LTPPGDLRN 59 

MRKPKITVIGGGTGIPVILKSLR +DVEI A+VTVADDGGSSGELR MQ LTPPGDLRN 
Sbjct: 1 MRKPKITVIGGGTGIPVILKSLREKDVEIAAIVTVADDGGSSGELRKNMQQLTPPGDLRN 60 

Query: 60 VLVALSDMPKFYEQIFQYRFAEGDGDFAGHPLGNLIIAGVAEMQGSTYNAMQSLTQFFHT 119 

VLVA+SDMPKFYE++FQYRF+E G FAGHPLGNLIIAG++EMQGSTYNAMQ L++FFHT 
Sbjct: 61 VLVAMSDMPKFYEKVFQYRFSEDAGAFAGHPLGNLIIAGLSEMQGSTYNAMQLLSKFFHT 120 

Query: 120 TGKIYPSSEHPLTrjHRVFKDGHEWGESQIADYKGMIDHVYVTNTyNEETPTASRKVVDA 179 

TGKIYPSS+HPLTI1HAVF4DG EV GES I D++G+ID+VYVTN N++TP ASR+W 
Sbjct: 121 TGKIYPSSDHPLTLEmVFQDGTEVAGESHIVDHRGIIDNVYVTNALNDDTPLASRRWQT 180 

Query: 180 ILESDMIVLGPGSLFTSILPNLVIPEIKCALLETRA3VAYVCNIMTQRGETEHFTDADHV 239 

ILESDMIVLGPGSLFTSILPN+VI EI +ALLET+AE+AYVCNIMTQRGETEHFTD+DHV 
Sbjct: 181 ILESDMIVLGPGSLFTSILPNIVIKEIGRALLETKAEIAYVCNIMTQRGETEHFTDSDHV 240 

Query: 240 FA^KRHLGQDAIDTVLVNIEKVPESYMENNHFDEYLVQVEHDFSGLRKHARRVISSNFLK 299 

EVL RHLG+ IDTVLVNIEKVP+ YM +N FDEYLVQVEHDF GL K RVISSNFL+ 
Sbjct: 241 EVLHRHLGRPFIDTVLVNIEKVPQEYMNSNRFDEYLVQVEHDFVGLCKQVSRVISSNFLR 300 

Query: 300 LEKGGAFHHGDFWEELMNLV 320 

LE GGAFH GD +V+ELM ++ 
Sbjct: 301 LENGGAFHDGDIjI VDELMRI I 321 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5535> which encodes the amino acid 
sequence <SEQ ID 5536>. Analysis of this protein sequence reveals the following: 

ti signal seq 

Final Results 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 251/320 (78%) , Positives = 284/320 (88%) 

Query: 1 MRKPKITVIGGGTGIPVILKSLRLEDVEITAWTVADDGGSSGELRSVMQLTPPGDLRNV 60 

M+ PK+TVIGGGTGI +ILKSLR E V+ITAWTVADDGGSSGELR+ MQL PPGDIjRNV 
Sbjct: 1 MKNPKMWIGGGTGISIILKSLRIIEAVDITAvVTvADDGGSSGELRNAMQIAPPGDL 60 



Query: 61 LVALSDMPKFYEQ1FQYRFAEGD3DFAGHPLGNLIIAGVAEMQGSTYNAMQSLTQFFHTT 120 
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Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 








Sbjct: 


241 




301 


Sbj ct: 


301 



LESDMIVLGPGSLFTSILPNLVIPEIKQALL3TRAEVAY\'CNIMTQRGETEHFTDADHVE 240 
LESDMIVLGPGSLFTSILPNLVIPEIK4AL +T+AEV Y+CMIMTQ GETE F+DADHV 
LESDMIVLGPGSLFTSILPNLVIPEIKEALRQTKAEV/YICNIMTQYGETEQFSDADHVA 240 



3 GGAFH G+ WEELMNLV 
iNGGAFHDGNLWEELMNLV 320 

SEQ ID 5534 (GBS269) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 12; MW 35kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 5; MW 60.5kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1781 

A DNA sequence (GBSxl888) was identified in S.agalactiae <SEQ ID 5537> which encodes the amino 
acid sequence <SEQ ID 5538>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0 . 2479 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB96619 GB:AJ400630 hypothetical protein [Streptococcus pneumoniae bacteriophage MM1] 
Identities = 209/303 (68%) , Positives = 260/303 (84%) 



Query: 


1 


Sbjct: 




Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbj ct : 


181 


Query: 


241 


Sbjct: 


241 



MSFTV VKEE+LG ++ ELSAIIKMSGS+GL+ GL LS+ TENAK+ARH4Y 
MSFWAVKEEILGQHHLSRHELSAIIKMSGSIGLSTSGLTLSVVTENAKLARHLYESFLH 60 



- EI++HQ++NLRKNRVYTVF +EKV +L+DL LAD+FFG+ETGI+ +IL ++E 



GRAYL GAFL+ G++R+P+SGKYQLEI SVYLDHAQ +A+L+++F+LDAKV+E K GAVT 



NI KI D +8 + LP DL++VAQ+R+ H PDYS I QQ+ADSL TPL+KSGVNHRLRKINKIA 
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Query: 301 DEL 303 
DEL 

Sbjct: 301 DEL 303 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5539> which encodes the amino acid 
sequence <SEQ ID 5540>. Analysis of this protein sequence reveals the following: 

a N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1698 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) , < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/303 (73%) , Positives = 269/303 (88%) 

Query: 1 MS FTVKVKEELLGHKSENKMELSAI I KMSGSLGLANHGLNLS I TTENAKI ARH I YSMLEE 60 

MSFT KVKEEL+ + + EL+AIIK+SGSLGLA+ L+LSITTENAKIAR+IYS++E+ 
Sbjct: 1 MSFTTKVKEELIHLSTGDNNEIAAIIKLSGSLGLAHQSLHLSITTENAKIARYIYSLIED 60 

Query: 61 HYHLQPEIKYHQKTNLRKNRVYTVFIEEKVDVILADLKLADAFFGIETGIEHSILDNDEN 12 0 

Y + PEI+YHQKTNLRKNRVYTV++E+ V+ ILADLKLAD+FFG+ETGIE +L +D 
Sbjct: 61 AYVIVPEIRYHQKTNLRKmWTVYVEQGVETIIADLKLADSFFGLETGIEPQVLSDDNA 120 

Query: 121 GRAYLRGAFLSTGTTOEPDSGKYQLEIFSVYLDHAQDIANLMKKFMLDAKVIEHKHGAVT 180 

GR+YL+GAFL+ G++R+P+SGKYQLEIH-SVYLDHAQDLA LM+KFMLDAK 1EHK GAVT 
Sbjct: 121 GRSYLKGAFLAAGSIRDPESGKYQLEIYSVYLDHAQDLAQLMQKFMLDAKTIEHKSGAVT 180 

Query: 181 YLQKAEDIMDFLIVIDAMES^AFEEIKMIRErRNDINRANNVETANIARTITASMKTIN 240 

YLQKAEDIMDFL I + 1 AM ++ FE IK++RE RNDINRANN ETANIA+TI +ASMKTIN 
Sbjct: 181 YLQKAEDIMDFLIIIGAMSCKEDFEAIKLLREARNDIHRANNAETANIAKTISASMKTIN 240 

Query: 241 NIIK1MDTIGFDADPSDLRQVAQVRVAHPDYSIQQIADSLETPLSKSGVNHRLRKINKIA 300 

NIIKIMDTIG ++LP +L+QVAQ+RV HPDYSIQQ+AD-i-LE P++KSGVNHRLRKINKIA 
Sbjct: 241 NIIKIMDTIGLESLPIELQQVAQLRVKHPDYSIQQVADALEFPITKSGVNHRLRKINKIA 300 

Query: 301 DEL 303 
D+L 

Sbjct: 301 DDL 303 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1782 

A DNA sequence (GBSxl889) was identified in S.agalactiae <SEQ ID 5541> which encodes the amino 
acid sequence <SEQ ID 5542>. This protein is predicted to be dipeptidase. Analysis of this protein 
sequence reveals the following: 

I- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=o .3544 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA86210 GB:Z38063 dipeptidase [Lactobacillus helveticus] 
Identities = 218/473 (46%) , Positives = 310/473 (65%) Gaps = 14/473 (2%) 
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Query: 


3 


.Sbjct: 


6 


Query: 


60 


Sbjct: 


65 


Query: 


117 


Sbjct: 


125 


Query: 


177 


Sbjct: 


185 


Query: 


237 


Sbjct: 


243 


Query: 


297 


Sbjct: 


303 




357 


Sbjct: 


363 




417 


Sbjct: 


422 



CTTILVGKKRSYnGSTMI7ARTEDSVNGDFTPKHjKVMTSKDQPRHYKSVLSNFEVD h 59 

CTTILVGKKAS DGSTMIAR+ED P+ KV+ +DQP+HY SV4-S ++D h 

CTTILVGKKASIDGSTMIARSSDG-GRVIIPEGFKWNPEDQPKHYTSVISKQKIDDEDL 64 



ED +TL LPY+ SA +GV+R+G ++EKYGTYE NG+AFSD + IW+LETIGGHHWIARR+ 



PDD YV PN+L ID F+F++ +++ +SDLK+ I++YHL+ E +N R+ FGS 



KD HYN PR+W + + +P+ P P+ + R I++EDIK+ S HYQD+ YD 



I 4GINR +T ILQ+R + E GVQWL++G F +M+P +T V 



T K + + +W H+L A L D ++ + +++ ++++AQ H 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5543> which encodes the amino acid 
sequence <SEQ ID 5544>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0514 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) c suco 

An alignment of the GAS and GBS proteins is shown below. 

, Identities = 345/464 (74%) , Positives = 407/464 (87%) 

ACTTIDVGKKASYDGSTMIARTEDS VNGDFTPKIOjKVMTSKDQPRHYKSVLSNFEVDLPD 6 1 
+CTTHjVGKKASYDGSTM+ARTEDS NGDFTFKK+ +DQPRHY+SV S+FE+DLPD 

SCTTILVGKKASYDGSTMVARTEDSQNGDFTPKKMIWXPEDQPRHYRSVQSSFEMDLPD 68 

NPLPYTSVPDALGKDGIWGEAGINSKNVAMSATETITTNSRVLGADPLVSDGIGEEDILT 121 
NP+ YTSVPDALGKDGIW EAG+N NVAMSATETITTNSRVLGADPLV+ GIGEED++T 
NPMTYTSVPDALGKDGIWAEAGVNEANVAMSATETITTNSRVLGADPLVASGIGEEDMVT 128 

LVLPYIQSrAREGVERI^ILEKYGTYESNGIAFSDTEEI»lWLETIGGHHWIARRVPDDvY 181 
LVLPYI+SAREGV RLGAILE YGTYESNG+AFSD + IWWLETIGGHHWIARRVPDD Y 



Sbjct: 


2 
9 






Sbjct: 


69 




122 


Sbjct: 


129 


Query: 


182 


Sbjct: 


189 




242 


Sbjct: 


249 



VTNPNQ GIDHFEFNN +DY+CS+DLK+FI+ YHLDLTYS+EHFNPRYAFGSQRDKDR Y 



NTPR+W MQ+FLNPSI QDPRS + WCQKPYRKITVED+KYVLS HYQD+ YDPYG EG 
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Query: 302 DAVSRRAFRSVGINRTSQTSILQLRPNKSLETTGVQWLSYGSMPFATI1VPLFTQVETVPN 361 

VS++ FR +GINRTSQT+IL +RPNK E +QW++YGSMPF TMVP FTQV+T+P+ 
Sbjct: 309 TPVSKKVFRPIGINRTSQTAILHIRPNKPQEIAAIQWMAYGSMPFHTMVPFFTQVKTIPD 368 

Query: 362 yFSNTTKDASTDNFYWTNRLIAALADPHFyQHEADIESYIERTMAQGHAHINGVDREVAE 421 

YF4NT ++ TDNFYWTNRLIAALADPH+ HE D+++Y+E TMA+GHA ++ V+ ++ 
Sbjct: 369 YFAOTYEOTFTDNFYWTNRLIAALADPHYNHHETDLDNYLEETMAKGHAMLHAVEVQLLA 428 

Query: 422 NKEIDFQQKNQEMSDYIQKESQELLNRILFDASN^MTNRFSMGD 465 

+ +D +++NQ+MSDY+Q E+Q LLN+ILFDASNLMTNRFS+ D 
Sbjct: 429 GETVDLEEENQKMSDYVQGETQTLDNKILFDASNIjMTNRFSLSD 472 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1783 

A DNA sequence (GBSxl890) was identified in S.agalactiae <SEQ ID 5545> which encodes the amino 
acid sequence <SEQ ID 5546>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Hot Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA96185 GB:Z71552 AdcA protein [Streptococcus pneumoniae] 
Identities = 257/429 (59%) , Positives = 312/429 (71%) , Gaps = 7/429 (1%) 

Query: 1 MRKKFLLLMSFVAMFAAWQLVQVKQWADSKLKVVTTFYPVYEFTKIIVVGDKADVSMLIK 60 

M+K LLL S A+F + Q AD KL +VTTFYPVYEFTK V GD A+V +LI 
Sbjct: 1 MKKIS1 J LIASLCALFL---VACSNQKQADGK1^IVTTFYP\^EFTKQVAGDTANVELLIG 57 

Query: 61 AGTEPHDFEPSTKKIAAIQDSNAFVYMDDNMETOAPKVAKSVKSKKVTTIKGTGDMLLTK 120 

AGTEPH++EPS K +A IQD++ FVY ++NMETW PK+ ++ KKV TIK TGDMLL 
Sbjct: 58 AGTEPHEYEPSAKAVAKIQDADTFVYENENI4ETWVPKLLDTLDKKKVKTIKATGDMLLLP 117 

Query: 121 GVEEEGEEHEGHGHEGHHHELDPHVWLSPERAISWENIRWKFVKAYPKDAASFNKNADA 180 

G EEE +H+ HG EGHHHE DPHVWLSP RAI +VE+IR+ YP +F KNA A 

Sbjct: 118 GGEEEEGDHD- HGEEGHHHEFDPHWLS PVRAI KLVEHIRDTLSADYPDKKETFEKNAAA 176 

Query:, 181 YIAKLKELDKEYKNGLSNAKQKSFVTQHAAFGYMALDYGENQVPIAGLTPDAEPSSKRLG 240 

YI KL+ LDK Y GLS AK+KSFVTQHAAF Y+ALDYGL QV I+GL+PDAEPS+ RL 
Sbjct: 177 YIEKLQSLDKAYAEGLSQAKEKSFVTQHAAFNYLALDYGLKQVAISGLSPDAEPSAARLA 236 

Query: 241 ELAr/IKKYNINYIYFEENASNKVAKTIADEVGVKTAVLSPLEGLSKKEMAAGEDYFSVM 3 00 

EL +Y+KK I YIYFEENAS +A TL+ E GVKT VL+PLE L++++ AGE+Y SVM 
Sbjct: 237 ELTEYVTCKNKIAYIYFEENASQAIAOTLSKEAGVKTDVljNPLESLTEEDTKAGENYISVM 296 

Query: 301 RRMiKVLKKTTDVAGKEVAPEE-DKTKTTOTGYFKTKDVKDRKLTDYSGNWQSVYPLLQD 359 

+NLK LK+TTD G + PE+ + TKTV+ GYF+ VKDR L+DY+GNWQSVYP L+D 
Sbjct: 297 EKNLKALKQTTDQEGPAIEPEKAEDTKTi/QNGYFEDAAVKDRTLSDYAGNWQSvYPFLED 356 

Query: 360 GTLDPVWDYKAKSIQQPMTAAEYKKYYTAGYKTDvESIKIDGKKHQMTFVRNGKSQTFTYK 419 

GT D V+DYKAK MT AEYK YYT GY+TDV II + M FV+ G+S+ +TYK 
Sbjct: 357 GTFDQVFDYKAKLTGKMTQAEYKAYYTKGYQTDVTKINI - -TDNTMEFVQGGQSKKYTYK 414 



60 



Query: 420 YAGYKILTY 428 

Y G KILTY 
Sbjct: 415 YVGKKILTY 423 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5547> which encodes the amino acid 
sequence <SEQ ID 5548>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
»> Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA96185 GB:Z71552 AdcA protein [Streptococcus pneumoniae] 
Identities = 259/438 (59%) , Positives = 326/438 (74%) , Gaps = 16/438 (3%) 

Query: 1 MKKKILLMMSLISVFFAWQLTQAKQVIjAEGKVKVVTTFYPVYEFTKGVIGNDGDVFMLMK 60 

MKK LL+ SL ++F + + Q A+GK+ +VTTFYPVYEFTK V G+ +V +L+ 
Sbjct: 1 MKXISLLIASLO^FL---VACSNQKQADGKnNIVTTFYPWEFTKQVAGDTANVELLIG 57 

Query: 61 AGTEPHDFEPSTKDIKKIQDADAFVYMDDNMSTWSDVKKSLTSKKVTIVKGTGNMLJjVA 120 

AGTEPH++EPS K + KIQDAD FVY ++NMETWV + +L KKV +K TG+MLL+ 
Sbjct: 58 AGTEPHEYEPSAKAVAKIQDADTFVYENENMETWVPKLLDTLDKKKVKTIKATGDMLLLP 117 

Query: 121 GAGHDHPHEDADKKHEHNKHSEEGHNHAFDPHVWLSPYESITVVENIRDSLSKAYPEKAE 180 

G E+ + H+H EEGH+H FDPHVWLSP R+I +VE+IRD+LS YP+K E 

Sbjct: 118 GG EEEEGDHDHG- - -EEGHHHEFDPHVWLSPVRAIKLVEHIRDTLSADYPDKKE 168 

Query: 181 NFKANAATYIEKLKELDKDYTAALSDAKQPCSFVTQHAAFGYMALDYGIiNQI S INGVTPDA 240 

F+ NAA YIEKL+ LDK Y LS AK+KSFVTQHAAF Y+ALDYGL Q++I+G++PDA 
Sbjct: 169 TFEKNAAAYIEKLQSLDKAYAEGLSQAKEKSFVTQIIAAFNYLALDYGLKQVAISGLSPDA 228 

Query: 241 EPSAKRIATLSKYVKKYGIKYIYFEENASSKVAKTLAKEAG1VKAAVLSPLEGLTEKEMKA 300 

EPSA R+A L++YVKK I YIYFEENAS +A TL+KEAGVK VL+PLE LTE++ KA. 
Sbjct: 229 EPSAARLAELTEYVKKNKIAYIYFEENASQALANTLSKEAGVKTDVLNPLESLTEEDTKA 288 

Sbjct: : 

Query: 360 SVYPYLQDGTLDQVWDYKAKKSKGKMTAAEYKDYYTTGYKTDVEQIKINGKKKTMTFVRN 419 

SVYP+L+DGT DQV+DYKAK + GKMT AEYK YYT GY+TDV KIN TM FV+ 
Sbjct: 349 SWPFLEDGTFDQVFDYKAJOJT-GKMTQAEYKAYYTKGYQTDV--TKINITDNTMEFVQG 405 

Query: 420 GEKKTFTYTYAGKE I IYTY 437 

G+ K +TY Y GK+ILTY 
Sbjct: 4 06 GQSKKYTYKYVGKKILTY 423 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 353/515 (68%) , Positives = 422/515 (81%) , Gaps = 9/515 (1%) 

Query: 1 MRKKFLLLMS WAMFAAWQLVQWQWADSKLKVVTTFYP VYEFTKNWGDKADVSMIj I K 60 

M+KK LL+MS +++F AWQL Q KQV A+ K+KWTTFYPVYEFTK V+G+ DV ML+K 
Sbjct: 1 MECKKILLMMSLISVFFAWQLTQAKQVlJffiGKVKVVTTFYPVYEFTKGVIGNDGDVFMLMK 60 

Query: 61 AGTEPHDFEPSTB^IAAIQDSNAFVYMDDNMETWAPKV^SVKSKKVTTIKGTGDMLLTK 120 

AGTEPHDFEPSTK+ 1 IQD++AFVYMDDNMETW V KS+ SKKVT +KGTG+MLL 
Sbjct: 61 AGTEPHDFEPSTKDIKKIQDADAFVYITODNMECT^ 120 

Query: 121 GV EEEGEEHEGHGHEGHHHELDPHVWLSPERAISVVENIRNKFVKAYPKDAA 172 

G ++ EH H EGH+H DPHVWLSP R+I+WENIR+ KAYP+ A 

Sbjct: 121 GAGHDHPHEDADKKHEHNKHSEEGHNHAFDPffA'ILSPYRSITVVENIRDSLSKAYPEKAE 180 



Query: 173 SFNKNADAYIAKLKELDKEYKNGLSNAKQKSFOTQHAAFGYMALDYGLNQVPIAGLTPDA 232 
+F NA YI KLKELDK+Y LS+AKQKS FVTQHAAFGYMALDYGLNQ+ I G+TPDA 
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Sbjct: 181 NFKANAATYIEKLKELDKDYTAALSDAKQKSFVTQHAM^YMALDYGLNQISINGVTPDA 240 

Query: 233 EPSSKRLGELAKYIKKYNINYIYFEENASNKVAKTIADEVGVKTAVLSPI.EGLSKKE^IAA 292 

EPS+KR+ L+KY+KKY I YIYFEENAS+KVAKTLA E GVK AVLSPLEGL++KEM A 
Sbjct: 241 EPSAKRIATLSKYVKKYGIKYIYFEENASSKVAKTLAKEAGVKAAVLSPLEGLTEKEMKA 300 

Query: 293 GEDYFSVMIRNLKVLKKTTDVAGKEVAPEEDKTKTVETGYFKTKDVKDRKLTDYSGMWQS 352 

G+DYF+VMR+NL+ L+ TTDVAGKE+ PE+D TKTV GYFK K+VKDR+L+D+SG+WQS 
Sbjct: 301 GQDYFTVMRKNLETLRLTTDVAGKEILPEKDTTKTVYNGYFKDKEVKDRQLSDWSGSWQS 360 

Query: 353 WPLLQDGTLDPVWDYKA-KSKKDMTAAEYKKYYTAGYKTDVESIKIDGKKHQMTFVRNG 411 

VYP LQDGTLD VWDYKA KSK MTAAEYK YYT GYKTDVE IKI+GKK MTFVRNG 
Sbjct: 361 VYPYLQDGTLDQVWDYKAKKSKGKMTAAEYKDYYTTGYKTDVEQIKINGKKKTMTFVRNG 420 

Query: 412 KSQTFTYKYAGYKILTYKKGNRGVRYLFEAKEKDAGQFKYIQFSDHGIKPNKAEHFHIFW 471 

+ +TFTY YAG +ILTY KGNRGVR++FEAKE DAG+FKY+QFSDH I P KA+HFH++W 
Sbjct: 421 EKKTFTYTYAGKEILTYPKGNRGVRFMFEAKEADAGEFKYVQFSDHAIAPEKAKHFHLYW 480 



Query: 472 G 

20 G +SQEKL +E+E+WPTY+ + +SGRE+AQ++ +H 

Sbjct: 481 GGDSQEKLHKELEHWPTYYGSDLSGREIAQEINAH 515 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 A related GBS gene <SEQ ID 8899> and protein <SEQ ID 8900> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 19 
30 Peak Value of UR: 2.79 

Net Charge of CR: 3 
McG: Discrim Score: 9.08 
GvH: Signal Score (-7.5): 2.59 
Possible site: 15 
35 >» Seems to have a cleavable N-term signal seq. 

Amino Acid Composition: calculated from 16 
ALOM program count: 0 value: 7.69 threshold: 0.0 
PERIPHERAL Likelihood = 7.69 264 
modified ALOM score: -2.04 

40 

*** Reasoning Step: 3 

Rule gpol 

45 Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the databases: 

3 758895 |emb|CAA96185.l| |Z71552 AdcA protein {Streptococcus pneumoniae} >PIR | T46756 | T46756 
Zn-binding lipoprotein 

adcA [imported] - Streptococcus pneumoniae (fragment) 



Query: 1 MRKKFLLLMSFVAMFAAWQLVQVKQWADSKLKVVTTFYPVYEFTKNWGDKADVSMLIK 60 

M+K LLL S A+F + Q AD KL +VTTFYPVYEFTK V GD A+V +LI 
Sbjct: 1 MKKISLLLASLCALFL VACSNQKQADGKLNIVTTFYPVYEFTKQVAGDTANVELLIG 57 



Query: 61 AGTEPHDFEPSTKNIAAIQDSNAFVYMDOOTffiTWAPKVAKSVKSKKVTTIKGTGDMLLTK 120 
AGTEPH++EPS K +A IQD++ FVY ++NMETW PK+ ++ KKV TIK TGDMLL 
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Sbjct: 58 AGTEPHEYEPSAKAVAKIQDADTfVYENENMETWTO^ 117 

Query: 121 GVEEEGEEHEGHGHEGHHHELDPHVWLSBERAISVVENIRNKWK^YPKDRMFNKNMA ISO 

G EEE +H+ HG EGHHHE DPHVWLSP RAI +VE+IR+ YP +F KNA A 

Sbjct: 118 GGEEEEGDHD-HGEEGHHHEFDPHWLSPVRAIKLVEHIRDTLSADYPDKKETFEKNAAA 176 

Query: 181 YIAKLKELDKEYKNGLSNAKQKSFVTQHAAFGYMALDYGLNQVPIAGLTPDAEPSSKRLG 240 

YI KL4 LDK Y GLS AK+KSFVTQHAAF Y+ALDYGL QV I+GL+PDAEPS+ RL 
Sbjct: 177 YIEKLQSLDKAYAEGLSQAKEKSFVTQHAAFNYLALDYGLKQVAISGLSPDAEPSAAELA 236 

Query: 241 ELAKY1KKYNIOTIYFEENASNKVAKTLADEVGVKTAVLSPLEGLSKKEMAAGEDYFSVM 300 

EL 4Y+KK I YIYFEENAS +A TJj+ E GVKT VL+PLE L++++ AGE+Y SVM 
Sbjct: 237 ELTEWKKMKIAYIYFEENASQALANTLSKEAGVKTDVLNPLESLTEEDTKAGENYISVM 296 

Query: 301 RRl^KVLKKTTDVAGKEVAPEE-DKTKTvTETGYFKTKDVKDRKLTDYSGNWQSVYPLLQD 359 

+NLK LK+TTD G + PE+ 4- TKTV+ GYF+ VKDR L+DY+GNWQSVYP L+D 
Sbjct: 297 EKNLKALKQTTDQEGPAIEPEKAEDTKTVQNGYFEDAAVKDRTLSDYAGNMQSVYPFLED 356 

Query: 360 GTLDPVWDYKAKSKKDMTAAEYKKYYTAGYKTDVESIKIDGKXHQMTFVRNGKSQTFTYK 419 

GT D V+DYKAK MT AEYK YYT GY+TDV II + M FV+ G+S+ +TYK 
Sbjct: 357 GTFDQVFDYKAKLTGKMTQAEYKAYYTKGYQTDVTKINI— TDNTMEFVQGGQSKKYTYK 414 

Query: 420 YAGYKILTY 428 

Y G KILTY 
Sbjct: 415 YVGKKILTY 423 

SEQ ID 8900 (GBS325) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 65 (lane 3; MW 58kDa). 

The GBS325-His fusion product was purified (Figure 210, lane 7) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 257A) and FACS (Figure 257B). These tests confirm 
that the protein is immunoaccessible on GBS bacteria. 

Example 1784 

A DNA sequence (GBSxl891) was identified in S.agalactiae <SEQ ID 5549> which encodes the amino 
acid sequence <SEQ ID 5550>. This protein is predicted to be ribosomal protein L31 (rl31). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1948 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9637> which encodes amino acid sequence <SEQ ID 9638> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAFB0389 GB:AF160251 ribosomal protein L31 [Listeria innocua] 
Identities = 61/81 (75%) , Positives = 71/81 (87%) , Gaps = 1/81 (1%) 

Query: 9 MKKDIHPDYRPVVFLDTTTGYKFLSGSTKSTKETVEFE-GETYPLIRVEISSDSHPFYTG 67 

MK IHP+YRPWF+DT+T +KFLSGSTKS+ ET+++E G YPL+RVEISSDSHPFYTG 
Sbjct: 1 MKTGIHPEYRPWFVDTSTDFKFLSoSTKSSSETIKWEEGMEYPLLRVEISSDSHPFYTG 60 

Query: 68 RQKFTCADGRVDRFNKKYGLK 88 

+QK ADGRVDRFNKKYGLK 
Sbjct: 61 KQKHATADGRVDRFNKKYGLK 81 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5551> which encodes the amino acid 
sequence <SEQ ID 5552>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/86 (94%) , Positives = 86/86 (99%) 

Query: 9 MKKDIHPDYRPWFIjDTTTGYKFLSGSTKSTKETVEFEGHTY'PLIRVEISSDSHPFYTGR 68 

M+KDIHPDYRPWFLDTTTGY+FLSGSTK++KETVEFEGETYPLIRVEISSDSHPFYTGR 
Sbjct: 1 MRKDIHPDYRPWFLDTTTGYQFLSGSTKASKETVEFEGETYPLIRVEISSDSHPFYTGR 60 

Query: 69 QKFTQADGRVDRFNKKYGLKDANAAQ 94 

QKFTCADGRVDRFNKKYGLKDANAA+ 
Sbjct: 61 QKFTQADGRVDRFNKKYGLKDANAAK 86 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1785 

A DNA sequence (GBSxl892) was identified in S.agalactiae <SEQ ID 5553> which encodes the amino 
acid sequence <SEQ ID 5554>. This protein is predicted to be aspartate aminotransferase (aspC). Analysis 
of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1740 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9421> which encodes amino acid sequence <SEQ ID 9422> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21948 GB:U32714 aminotransferase [Haemophilus influenzae Rd] 
Identities = 200/323 (61%) , Positives = 264/323 (80%) , Gaps = 1/323 (0%) 

MQYYQLQNI-HVDMDDIYIVNGVSEGISMSMQALLDNDDEVLVPMPDYPLWTACVSLAGG 59 
+QYYQ + I ++D+YI NGVSE I+M+MQALL++ DEVLVPMPDYPLWIA V+L+GG 



AVHY+CDE+ANW+P IDDIK+K+ +KTKAIV+INPNNPTGAVY +E+LQEIV+IARQN+ 



Query: 


1 


Sbjct: 


82 


Query: 


60 


Sbjct: 


142 


Query: 


120 


Sbjct: 


202 


Query: 


180 


Sbjct: 


262 



LIIF+DE+YD+++ DG H IA++A D+ TVTL+GLSK++R+ GFR GWM+L+GP+ H 



KGYIEGL+MLA+MRLC+NV Q IQT+LGG QSI+ +LPGGR+ EQRN 



WO 02/34771 



PCT/GB01/04789 



Query: 240 GLSAVKPWAGLYLFPKIDTDMYRID^EEF\™FLKQEKVLLTHGRGFNMNTADHFRIVY 299 

G++ VKP +Y+FPKID + I +DE+ VL+ L+QEKVLL HG+GFN ++ DHFRIV 
Sbjct: 322 GITCVKPMGAIOTFPKIDVKKFNIHSDEKMVLDLLRQEKVLLVHGKGFNWHSPDHFRIVT 381 

Query: 300 LPRVDELTELQEKMARFLSQYKR 322 

LP V++L E K+ARFLS Y++ 
Sbjct: 382 LPYVMQLEEAITKLARFLSDYRQ 404 

There is also homology to SEQ ID 3662. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1786 

A DNA sequence (GBSxl893) was identified in S.agalactiae <SEQ ID 5555> which encodes the amino 
acid sequence <SEQ ID 5556>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.02 Transmembrane 164 - 180 ( 163 - 181) 

Final Results 

bacterial membrane Certainty=0 .1808 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10099> which encodes amino acid sequence <SEQ ID 
101 00> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06181 GB:AP001515 transcriptional pleiotropic repressor 
[Bacillus halodurans] 
Identities = 129/257 (50%), Positives = 181/257 (70%), Gaps = 3/257 (1%) 

Query: 23 NLLEKTRKITSILQRSVDSLDAELPYOTMAAQIjMIIDCNACIINGGGNLLGYAMKYICra 82 

+LL + RKI +LQ+S + + MA L D+I N +++ G LLG+A+K + 

Sbjct: 2 SLLSRMRKINDMLQKSG VQ HVNFREMAETLRDVI SANI FWSRRGKLLGFAI KQEIE 58 

Query: 83 TDRVEEFFETKQFPDYYVKSASRVYDTEANLSVDNDLSIFPVETKENFQDGITTIAPIYG 142 

+R++4- E +QFP+ Y +V +T ANL ++++ + FPVE KE F+ G+TTI PI G 

Sbjct: 59 NERMKKMLEDRQFPEEYTTGLFKVEETSANLDINSEFTAFPVENKELFKTGLTTIVPISG 118 

Query: 143 GGMRLGTFIIWraTOKEFSDDDLILWIASTWGIQLLNLQTENLEENIRKQTAVTMAINT 202 

GG RLGT 1+ R + F+DDDLIL E +TWG+++L+ +T+ +EE R + V MAI++ 
Sbjct: 119 GGQRLGTLILARLNDSFNDDDL1LAEYGATVVGMEILHEICTQEIEEEARSKAWQMAISS 178 

Query: 203 LSYSEMKAVAAILGELDGLEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLGMK 262 

LSYSE++AV I ELDG EG L AS IADR+GITRSVIVNALRKLESAG+IESRSLGMK 
Sbjct: 179 LSYSELEAVEHIFEELDGKEGLLVASKIADRVGITRSVIVNALRKLESAGVIESRSLGMK 238 

Query: 263 GTYLKVINEGIFDKLKE 279 

GTY+KV+N+ +L++ 
Sbjct: 239 GTYIKVLNDKFLVELEK 255 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5557> which encodes the amino acid 
sequence <SEQ ID 5558>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

erminal signal sequence 

144 - 160 ( 143 - 161) 

■ Final Results 
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bacterial membrane Certainty=0. 1256 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB13490 GB:Z99112 transcriptional regulator [Bacillus subtilis] 
Identities = 131/255 (51%) , Positives = 179/255 (69%) , Gaps = 3/255 (1%) 

Query: 4 LLEKTRKITSILQRSVDSLETELPYNTMASRLADIIDCNACIINGGGTLLGYAMKYKTNT 63 

LL+KTR 1 S+LQ + + + MA L D+ID N +++ G LLGY++ + 

Sbjct: 3 LLQKTRI INSMLQAAAGK PVNFKEMAETLRDVIDSNIFWSRRGKLLGYSINQQIEN 59 

Query: 54 DRVEEFFEAKQFPDTYVKAASRVYDTEANLSVENELTIFPVESKDTYPGGLTTIAPIYGG 123 

DR+++ E +QFP+ Y K V +T +NL + +E T FPVE+4D + GLTTI PI GG 
Sbjct: SO DRMKKMBEDRQFPEEYTKNLFNVPETSSNLDINSEYTAFPVENRDLFQAGLTTIVPIIGG 119 

Query: 124 GMRLGSLIIWRNDNEFSDDDLILWISSTWGIQLLNLQTENLEDTIRKQTAVNMAINTL 183 

G RLG+LI+ R ++F4DDDLIL E +TWG+++L + E +E+ R + V MAI++L 
Sbjct: 120 GERLGTLILSRLQDQFNDDDLILAEYGATWGMEILREKAEEIEEEARSKAWQMAISSL 179 

Query: 134 SYSEMKAVAAILGELDGNEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLGMKG 243 

SYSE++A+ I ELDGNEG L AS IADR+GITRSVI VNALRKLESAG+ IESRSLGMKG 
Sbjct: 180 SYSELEAIEHIFEELDGNEGLLVASKIADRVGITRSVIVNALRKLESAGVIESRSLGMKG 239 

Query: 244 TYLKVINEGI FAKLK 258 

TY+KV+N +L+ 
Sbjct: 240 TYIKVLNNKFLIELE 254 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 232/260 (89%), Positives = 247/260 (94%) 

Query: 21 MPNLLEKTRKITSILQRSVDSLDAELPYNTMAAQLADIIDCNACIINGGGNLLGYAMKYK 80 

MPNLLEKTRKITSILQRSVDSL+ ELPYNTMA++I1ADIIDCNACI INGGG LLGYAMKYK 
Sbjct: 1 MPNLLEKTRKITSILQRSVDSLETELPYNTMASRLADIIDCNACIINGGGTLLGYAMKYK 60 

Query: 81 TNTDRVEEFFETKQFPDYYVKSASRVYDTEANLSVDNDLSIFPVETKENFQDGITTIAPI 140 

TNTDRVEEFFE KQFPD YVK+ASRVYDTEANLSV+N+I1+IFPVE+K+ + G+TTIAPI 
Sbjct: 61 TNTDRVEEFFEAKQFPDTYVKAASRVYDTEANLSVENELTIFPVESKDTYPGGLTTIAPI 120 

Query: 141 YGGGMRLGTFIIWRNDKEFSDDDLILVEIASTWGIQLLNLQTENLEENIRKQTAVTMAI 200 

YGGGMRLG+ IIWRND EFSDDDLILVEI+STWGIQLLNLQTENLE+ IRKQTAV MAI 
Sbjct: 121 YGGGMRLGSLIIWRNDNEFSDDDLILVEISSTWGIQLLNLQTENLEDTIRKQTAVNMAI 180 

Query: 201 NTLSYSEMKAVARILGELDGLEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLG 260 

NILSYSEMKAVAAILGELDG EGRLTASVIADRIGITRSVI VNALRKLESAGI IESRSLG 
Sbjct: 181 NTLSYSEMKAVAAILGELDGNEGRLTASVIADRIGITRSVIVNALRKLESAGIIESRSLG 240 

Query: 261 MKGTYLKVINEGIFDKLKEY 280 

MKGTYLKVINEGIF KLKE+ 
Sbjct: 241 MKGTYLKVTNEGIFAKLKEF 260 



A related GBS gene <SEQ ID 8901> and protein <SEQ ID 8902> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
55 McG: Discrim Score: -6.84 

GvH: Signal Score (-7.5): -5.37 

Possible site: 13 
>>> Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -2.02 threshold: 0.0 
60 INTEGRAL Likelihood = -2.02 Transmembrane 114 - 130 ( 113 - 131) 

PERIPHERAL Likelihood = 3.61 179 
modified ALOM score: 0.90 



*** Reasoning Step: 3 
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Final Results 

bacterial membrane Certainty=0. 1808 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF02556(223 - 987 of 1293) 

EGAD|l3275 |BS1617 (4 - 255 of 259) cody protein {Bacillus subtilis} OMNI |NT01BS1895 cody 
10 protein (vegetative protein 286b) (veg286b) GP| 5353S1 |gb|AABD3372.1 ] |U13634 CodY {Bacillus 

subtilis] GP|2633989|emb|CAB13490.l| |Z99112 transcriptional regulator {Bacillus subtilis} 
PIR|S61496 | S61496 transcription pleiotropic repressor codY - Bacillus subtilis 
%Match =29.1 

%Identity =50.6 %Similarity =71.5 
15 Matches = 128 Mismatches = 71 Conservative Sub.s = 53 

177 207 237 267 297 327 357 387 

DCKS*NALI*L*RKTYKG*RKCRIYLEKTRKITSILQRSVDSLDAEL?YOT 

hin i mi = ■■ ■■ II I 101 I = = = I 1111== 

20 MALLQKTRI INSMLQAAAGK PVNFKEMAETIiRDVIDSNIFWSRRGKLLGYSINQ 

10 20 30 40 50 

417 447 477 507 537 567 597 627 

KTNTDRVEEFFETKQFPDYWKSASRWDTEANLSVDNDL 
25 : ||::: :| :|||: | |: | : | : | | : : : : : | | \ \ : : | | | : | | | | | | | | | | || : | : | :| 

qi™drmkkmledrqfpeeytknlfnvpetssnldinseytafpvenrdlfqaglttivpiigggerlgtlilsrlqdqf 

70 80 90 100 110 120 130 

657 687 717 747 777 807 837 867 

30 SDDDLILVEIASTWGIQLLNLQTE!^EENIRKQ"AVT^INTLSYSEMKAVAAILGELDGLEGRLTASVIADRIGITRS 
= 111111 I :||||:::| =1=11 I = I I I I = = I I I I I = = I = 1= llll II I II 1111 = 11111 
NDDDLILAEYGATWGmiLREKAEEIEEEARSKAWQMAISSLSYSELFAIEElFEELDGNEGLLVASKIADRVGITRS 
150 160 170 180 190 200 210 

35 897 927 957 987 1017 1047 1077 1107 

VI VNALRKLESAGI IESRSLGMKGTYLKVINEGIFDKLKEYN* S*HGTGSSFQFLFWNQEEIRRKMTXXN* LXXLFS*RL 
lllllllimil=llllllllllll=ll=l = =1= 
VTVNALRKLESAGVIESRSLGMKGlYIKVLMfKFLIELENlKSH 
230 240 250 

40 SEQ ID 8902 (GBS43 1 ) was expressed in E. coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 7; MW 54kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 77 (lane 6; MW 29kDa). 

GBS431-GST was purified as shown in Figure 223, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1787 

A DNA sequence (GBSxl894) was identified in S.agalactiae <SEQ ID 5559> which encodes the amino 
acid sequence <SEQ ID 5560>. This protein is predicted to be isochorismatase. Analysis of this protein 
sequence reveals the following: 

50 Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood - -2.81 Transmembrane 126 - 142 ( 125 - 142) 

Final Results 

55 bacterial membrane Certainty=0. 2126 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15164 GB:Z99120 similar to pyrazinamidase/nicotinamidase 
[Bacillus subtilis] 
Identities = 99/181 (54%) , Positives = 132/181 (72%) 

Query: 1 MTKALI S IDYTYDFVADDGKLTAGKPAQSIASAIADVTEKAYRSGDYI FFAIDNHDIGDV 60 

M KALI IDYT DFVA DGKLT G+P + I AI ++T++ +GDY+ A+D+HD GD 
Sbjct: 1 MKKALlCIDYTiroFvASDGKLTCGE^ 60 

Query: 61 FHPESNLFPEHNIKGTSGRNLYGPLGTLYETIKEDSRVFWIDKRHYSAFSGTDLDIRLRE 120 

+HPE+ LFP HNIKGT G4+LYG L MT+ + + V++++K YSAF4GTDL444LRE 
Sbjct: 61 YHPETRLFPPHNIKGTEGKDLYGKLLPLYQKEEHEPNVYYMEKTRYSAFAGTDLELKLRE 120 

Query: 121 RRvDTLILTGVLTDICVLHTAIDAYNLGYKIEVPAAAVASLNDSlfflQWALNHFKTVLGATI 181 

R++ L L GV TDICVLHTA+DAYN G++I V AVAS N H WAL+HF +GA + 
Sbjct: 121 RQIGELHLAGVCTDICVLHTAVDAYNKGFRIWHKQAVASFNQEGHAWALSHFANSIGAQV 181 

A related DNA sequence was identified in S.pyogenes <SEQ ID 556 1> which encodes the amino acid 
sequence <SEQ ID 5562>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial membrane --- Certainty=0. 2041 (Affirmative) < euc-o 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15164 GB:Z99120 similar to pyrazinamidase/nicotinamidase 
[Bacillus subtilis] 
Identities = 90/179 (50%), Positives = 127/179 (70%) 

Query: 3 RALISIDYTNDFVADDGKLSAGKSAQAIATKIAEVTKTAFDQGDYIFFAIDCHDQNDSWH 62 

+ALI IDYTNDFVA DGKL+ G+ 4 I I +TK GDY+ A+D HD+ D +H 

Sbjct: 3 KALICIDYTNDFVASDGKlTCGEPGRMIEEAIVNIiTKEFITNGDYvVLAVDSHDEGDQYH 62 

Query: 63 PESKlFAAHNIKGTTGRHLYGPIiAEVYSYMKQHPRVFWIDKRYYSAFSGTDLDIRLRERG 122 

PE++LF HNIKGT G+ LYG L +Y + P V4-+++K YSAF+GTDL+++LRER 
Sbj ct : 63 PETRLFPPHNIKGTEGKDLYGKLLPLYQItHEHEPNVYYMEKTRYSAFAGTDLELKLRERQ 122 

Query: 123 ITQLVLTGVLSDICVLHTAIDAYHLGYQLEIVKSAVASLTKESYEWSLAHFEQVLGAKL 181 

I +L L GV +DICVLHTA+DAY+ G+++ + K AVAS +E + W+L+HF +GA++ 
Sbj Ct : 123 IGELHLAGVCTDICVLHTAVDAYNKGFRIX^KQAVASFfJQEGHAWALSHFANSIGAQV , 181 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 121/180 (67%), Positives = 150/180 (83%) ' .. 

Query: 3 KAL1SIDYTYDFVADDGKLTAGKPAQS1ASAIADVTEKAYRSGDYIFFAIDNHDIGDVFH 62 

+ALISIDYT DFVADDGKL+AGK AQ+IA+ IA+VT+ A+ GDYIFFAID HD D +H 
Sbjct: 3 RALISIDYTNDFVADDGKLSAGKSAQAIA'TKIAEVTKTAFDQGDYIFFAIDCHDQNDSWH 62 

Query: 63 PESNLFPEHNIKGTSGRNLYGPLGTLYETIKEDSRVFWIDKRHYSAFSGTDLDIRLRERR 122 

PES LF HNIKGT+GR+LYGPL +Y +K+ RVFWIDKR+YSAFSGTDLDIRLRER 
Sbjct: 63 PESKLFAAHNIKGTTGRHLYGPLAEVYSYMKQHPRVFWIDKRYYSAFSGTDLDIRLRERG 122 

Query: 123 VDTLILTGVLTDICTLHTAIDAYNLGYKIEVPA 182 

+ L+LTGVL+DICVLHTAIDAY+LGY++E+ +AVASL +++W+L HF+ VLGA ++ 
Sbjct: 123 ITQLVLTGVLSDICTLHTAIDAYHLGYQLEIVKSAVASLTKESYEWSLAHFEQVLGAKLI 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 
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Example 1788 

A DNA sequence (GBSxl895) was identified in S.agalactiae <SEQ ID 5563> which encodes the amino 
acid sequence <SEQ ID 5564>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1539 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccinesordiagnosti.es. 

Example 1789 

A DNA sequence (GBSxl896) was identified in S.agalactiae <SEQ ID 5565> which encodes the amino 

acid sequence <SEQ ID 5566>. This protein is predicted to be 3-hydroxyacyl-CoA dehydrogenase (hbd- 

10). Analysis of this protein sequence reveals the following: 

20 Possible site: 46 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.27 Transmembrane 3 - 19 ( 1-19) 
Likelihood = -0.11 Transmembrane 277 - 293 ( 277 - 294) 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12219 GB:AE001862 3 -hydroxyacyl-CoA dehydrogenase, putative 
[Deinococcus radiodurans] 
Identities = 151/321 (47%) , Positives = 196/321 (61%) , Gaps = 36/321 (11%) 



+M+IK +TV GSGVLGSQIAFQ A+ G V +YDIND A+ K +E + KL 





56 


Sbjct: 


51 


Query: 


116 


Sbjct: 


111 


Query: 


176 


Sbjct: 


139 


Query: 


236 


Sbjct: 


199 




296 


Sbjct: 


259 


Query: 


356 



+ F+KDIGMV L ++KEQ GYILN++LVP L +AL L 



++D +T+DKTW + TGAP GP LD+IG+ T YNI N 4 



KE +IDKG+ GAG GFY Y 
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Sbjct: 315 KENYIDKGKLGTATGEGFYKY 335 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8903> and protein <SEQ ID 8904> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 1.55 
Net Charge of CR: 1 
McG: Discrim Score: -0.60 
GvH: Signal Score (-7.5): -3.93 

Possible site: 21 
>>> Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 1 value: -0.11 threshold: 0.0 

INTEGRAL Likelihood = -0.11 Transmembrane 221 - 237 ( 221 - 238) 
PERIPHERAL Likelihood =4.61 6 
modified ALCM score: 0.52 
icml HYPID: 7 CFP: 0.104 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

37.5/60.5% over 278aa 

Archaeoglobus 

fulgidus 

EGAD | 103851 | 3-hydroxyacyl-CoA dehydrogenase Insert characterized OMNl|AF2273 3- 
hydroxyacyl-CoA dehydrogenase (hbd-10) Insert 
characterized 

GP|2648250|gb|AAB88983.l| |AE000948 3-hydroxyacyl-CoA dehydrogenase (hbd-10) Insert 
characterized 

PIRlA69534|A69534 3-hydroxyacyl-CoA dehydrogenase (hbd-10) homolog - Insert characterized 
ORF01176(475 - 1431 of 1731) 

EGAD 1 103851 |AF2273 (17 - 295 of 668) 3-hydroxyacyl-CoA dehydrogenase {Archaeoglobus 
fulgidus}0MNl|AF2273 3-hydroxyacyl-CoA dehydrogenase (hbd- 

10) GP| 2648250 |gb|AAB88983 . 1 | |AE00094B 3-hydroxyacyl-CoA dehydrogenase (hbd-10) 

{Archaeoglobus fulgidus}PIR|A69534 |A69534 3-hydroxyacyl-CoA dehydrogenase (hbd-10) homolog 
- Archaeoglobus fulgidus 
%Match =14.8 

%Identity =37.5 %Similarity =60.4 

Matches = 106 Mismatches = 106 Conservative Sub.s = 65 

387 417 447 477 507 537 567 597 

KKRYYFKNNHTIYLLLDISFVKLSSKTFSNISIGGONnviTIK 

= : II = I |:1::| II I I =11= II I ===l 

MPRRWQVINMDWERIKnA^/LGAGLMSHGIAEVCAMAGyNVTMRDIKQEFVDRGM 



ERIKK-^KVYQS-EIETAKFAYSDKAKSIKYNKt^LPSLDHIFLSKVADSLDLIADLPNQITFSKNLDQAVSDADLVIE 
11= 111= I =l = = hl I =1 = =l = = !l I I I I I I I 

NMIKESLAKLEQKGKIKSAEEVLS RIKPTVDLEEAVKDADLVIE 
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861 891 921 951 981 1011 1041 1071 

AVPETVSIKEDFYKQIAKmPSKTIFATNSSTLVPSQFADITGRPDKPlJy/IHFA^IWQmiVEIMGHKGTDDEVIKI^ 
llll I II: = = = = 1=1 II =1=11= = =11 I 11=11 =11 I =ll== = I 111 = = 

5 AVPEVVEIKKQVVffiEVDKIAKPDCIFTSOTSTMRITMIADFTSRPEKFAGLHFFNPPVLMRLVEVIRGEKTSDEVMDLLV 
120 130 140 150 160 170 180 

1101 1131 1161 1191 1221 1251 1281 1311 

AFSKDIGMVPLHIHECEQPGYILNSILVPFLESALALYYDOTSDSETIDKTWKLGTGAPMGPLEILDIIGIDTAYNIMKNY 
10 | | || |: : |: || = l = l = I =1= == I =1 I = I 1111 = 1-1 1 = 1 II =1 I 

EFVKSIGKTPWVEKDVPGFIVNRVQAPASVL^IAILEKGIATPEETOATVR-RLGLPMGPFELVDYTGVDILYNALKYY 
200 210 220 230 240 250 260 

1341 1371 1401 1431 1461 1491 1521 1551 

15 SDTNSDPNSLHAHLAKMLKEEFIDKGRTGKAAGHGFYDYD*TIKETO*KS1^FYNSTKE*LHQEQF*NDLKPIDDYYHLS 
=111= = = II = : 1= 1=1111= == = = I 

AQTIS-PD YEPPKFLEEMVKANKLGRKTGQGFYDWSXGRPQ1DSSKATDKINPMDFTFVEINEAVKLVEMGVATPQ 

270 280 290 300 310 320 330 

20 SEQ ID 8904 (GBS1 12) was expressed in E.coli as a EEs-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 5; MW 39kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 34 (lane 3; MW 64kDa). 

GBS112-GST was purified as shown in Figure 198, lane 10. 
Example 1790 

25 A DNA sequence (GBSxl897) was identified in S.agalactiae <SEQ ID 5567> which encodes the amino 
acid sequence <SEQ ID 5568>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3332 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10097> which encodes amino acid sequence <SEQ ID 
10098> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14467 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 62/169 (36%) , Positives = 109/169 (63%) , Gaps = 3/169 (1%) 

Query: 1 MAVLSMLGIIDAKPKVGYFYLGQYHASIGTSHFE™tVSEIMGIPLTVHQKDSVYDVIVH' 60 

+A+L+M G ++A+P+VGYFY G+ + +K+ V + IP+ +H+ SVYD I 

Sbjct: 43 LAI LTMSGFLEARPRVGYF YTGKTGTQLLADKLKKLQVKDFQS I P WI HENVS VYDAI CT 102 

Query: 61 IF^ffiDAGCAFILDDDDFLCGWSRKDLLKISIGGGDLSKMPIG^lVMTRMPHVTTvLENES 120 

+F+ED G F++D D L GV+SRKDLL+ SIG +L+ +P+ ++MTRMP++T 4 
Sbjct: 103 MFLFlJVGTLFVVDRDAVLVGVLSRKDLLSASIGQQELTSVPvHIIMTRMPNITVCRREDY 162 

Query: 121 LFARADKLVSRKVDSLPVVRHDKQYPEKFKVIGKLSKTILASLFLEIRD 169 

+ A L+ +++D+LPV4 K + F+VIG+++KT + + + + + 
Sbjct: 163 VMDIAKHLIEKQIDAIjPVI KDTDKGFEVIGRVTKTNMTK1LVSLSE 20B 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example 1791 

A DNA sequence (GBSxl898) was identified in S.agalactiae <SEQ ID 5569> which encodes the amino 
acid sequence <SEQ ID 5570>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 60 - 76 ( 60 - 76) 

Pinal Results 

bacterial membrane Certainty=0. 1213 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — - Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05092 GB:AP001511 unknown conserved protein [Bacillus halodurans] 
15 Identities = 126/256 (49%) , Positives = 183/256 (71%) , Gaps = 1/256 (0%) 



Query: 


7 


IFIISDSLGETAKAIAKACLSQFPGHDDKHFQRFSYINSQERLEQVFEEASQKTOFMMFS 


66 






++++SDS+GETA+ + KA SQF G +R Y+ +E +++V + A Q + F+ 




Sbjct: 


10 


VYWSDSVGETAELWKAAASQFSGAGI-EVRRIPYVEDKETVDEVIQLAKQADAIIAFT 


68 




67 


LVDVALASYAQKRCESEHyAYVDLLTNVIQGISRISGIDPLGEPGILRRLDKDYFKRVES 


126 






LV + +Y ++ VD++ +++ IS ++ +P EPGI+ RLD DYF++VE+ 




Sbjct: 


69 


LWPGIRTYLLEKATEAJWETVDIIGPMLEKISSLTKEEPRYEPGIVYRLDEDYFRKVEA 


128 




127 


IEFAVKYDDGRDPRGILQADLVIIGISRTSKTPLSMFLADKNIKVINIPLVPEVPVPKEL 


186 






IEFAVKYDDGRDPRGI++ADLV+IG+SRTSKTPLS +LA K +KV N+PLVPEV P+EL 




Sbjct: 


129 


IEFAVKYDDGRDPRGIVRADLVLIGVSRTSKTPLSQYLAHKRLKVANVPLVPEVEPPEEL 






187 


RMIDSRRIIGLTNSVBHIjNQWKVRIjKSIiGLSSTANYASLERILEETRYAEEvWKNLGCP 


246 






+ +++IGL S + LN +R RLK+LGL S ANYA+++RI EE YAE +MK +GCP 




Sbjct: 


189 


FKLSPKKVIGLKISPEQLNGIRAERLKTLGLKSQANYANIDRIKEEIAYAEGIMKRIGCP 


248 


Query: 


247 


IINVSDKAIEETATII 262 








+I+VS+KA+EETA +1 




Sbjct: 


249 


VIDVSNKAVEETANLI 264 





No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 5570 (GBS378) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 4; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
40 product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 2; MW 59kDa). 

GBS378-GST was purified as shown in Figure 212, lane 6. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1792 

45 A DNA sequence (GBSxl899) was identified in S.agalactiae <SEQ ID 5571> which encodes the amino 
acid sequence <SEQ ID 5572>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3703 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35361 GB:AE001709 pyruvate , orthophosphate dikinase 
[Thermotoga maritima] 
Identities = 494/882 (56%), Positives = 639/882 (72%), Gaps = 9/882 (1%) 

Query: 1 METKFVYHFD EGCKEMKELLGGKGANLAEMTSIGLPVPQGFTITTQACNDYYDNAC 56 

M K+VY F EG +MK++LGGKGANLAEMT++G+PVP GFTI+ + C YYD+ 

Sbjot: 1 MAKK^FFANGKAEGRADMKDILGGKGANIJWiTl^GIPVPPGFTISAEVCKyYYDHGR 60 

Query: 57 HIRESILSQIDQAMAQLETOQNKQLGSvDDPLLVSTOSGSVFSMPGMMDTVMLGLNDRS 116 

E + Q+++AM +LE K+ G ++PLLVSVRSG+ SMPGMMDTVLNLGLND + 
Sbjct: 61 TYPEELKEQVEFJMRRLEEVTGKKFGDPlOTPLLVSVRSGAAISMPG^TvTMLGLNDET 120 

V+GL K T +ERFAYD+YRRF+QMF DW IP KF+ L+ LK +K + DTEL D 
Sbjct: 121 VKGIAKLTNNERFAYDAYRRFLQMFGDVVLKIPKEKFEKALEELKKEKGVKLDTELDAED 180 

Query: 177 LKRLVEFYKELYQKmGEKFPQDPKRQLIJ^IFAVFKSV^PRAKIYRKLNDIPE--TLG 234 

LK+LVE YK++Y KE G++FPQDP +QL LAI+AVF SW N RA YR+++ IE LG 
Sbjct: 181 LKI<liVERYKQIY-ICEEGKEFPQDPWKQLWLAIDAVFGSVMHERAIKYRQIHGIKEGDLLG 239 

Query: 235 TAVNIQAMVFGNMGNNSGTGVAFTRNPSTGAANLFGEYLINAQGEDWAGIRTPQSISKL 294 

TAVNI AMVFGNMG + SGTGVAFTR+ P+TG +GE+L NAQGEDWAGIRTP + +L 
Sbjct: 240 TAWIVAMVFGNMGEDSGTGVAFTRDPNTGEKKPYGEFLPNAQGEDWAGIRTPLKIiEEL 299 

Query: 295 AEQMPIIYQEFVSVTQKLEAHYRDMQDMEFTIENGNLYMLQTRSGKRTAKAAIKIAVDQV 354 

+MP +Y + + + KLE HYRDMQD+EFT+E G LY+LQTR+GKRT+ +AAI + IAVD V 
Sbjct: 300 KHRMPEVWQLLEIMDKLEKElYRDMQDIEFIVERGKliYILQTRISIGKRTSQAAIRIAVDMV 359 

Query: 355 NEGLISKEEAILRIEPKQLDQLLHPSFDLKSLKKAIILTTGLPASPGAAYGKVYFilAEDV 414 

+EGM+KEEAILR+ P+ ++Q+LHP FD K +A ++ GIiPASPGAA GKV F+A+ 
Sbjct: 360 HEGLITKEEAILRWPEDVEQVLHPVFDPKEKAQAKVIAKGLPASPGAATGKWFNAKKA 419 

Query: 415 VKEMKKGNPVlLVRQETSPEDIEGWSANGIITARGGMrSHAAVVARGMGKPCVAGCSQL 474 

+ KG V+LVR ETSPED+ GM +A GI+T+RGGMTSHAAWARGMGKP V G + 
Sbjct: 420 EELGKAGEQVILWPET8PEDVGGMAAAQGILTSRGGMTSHAAWARGMGKPAWGAESI 479 

Query: 475 LVDEWREISIGHQTIKEGEMLSIDGATGNVYIGQV-PMAETSVDRDFEIFMKWVDENRD 533 

V +G +KEGE +SIDG TG V +G+V + ++ ++W DE R 

Sbjct: 480 EWPEEGYFKVGD^/VVT^GEWISIDGTTGEVLLGKvTTIKPQGLEGPVAELLQWADEIRR 539 

Query: 534 MWCSNADNPRDAQKAIaDFGAEGIGLCRTEHMFFDDERIPVVREMILADEILSRRKALER 593 

+ V +NAD PRDA+ A FGAEGIGLCRTEHMFF+ +RIP VR MILA R KAL+ 

Sbjct: 540 LGTOTN7ADIPRDAEVARKFGAEGIGLCRTEHKFFEICDRIPKVRRMILAKTKEEREKALDE 599 

Query: 594 LLSFQRDDFYQIFKVLKGKACTIRLLDPPLHEFLPHDKESIESMARQMGISTLAIEKRIQ 653 

LL Q++DF +F+V+KG TIRL+DPPLHEFLP + E 1+ +A QMG+S ++ ++ 
Sbjct: 600 LLPLQKEDFKGLFRVMKGLPvTIRLIDPPLHEFLPQEDEQIKEVAEQMGVSFEELEOWVE 659 

Query: 654 TLEEFNPMLGHRGCRIAITYPEIYQMQVRALVQGAI-LAMKEGYEAKPEIMIPLVTAHEE 712 

L+E NPMLGHRGCRL ITYPEI MQ +A++ AI L +EG + PEIMIPLV E 
Sbjct: 660 KfLKELNPMLGHRGCRLTITYPEIAVMQTKAIIGAAIELKKEEGIDVIPEIMIPLVGHVME 719 

Query: 713 ISIIRDLIEETIVEESKSKKINLSFPIGTMIETPRACMIADDIAKFADFFSFGTNDLTQM 772 

+ ++ +I+ET K + L++ IGTMIE PRA + A IA+ A+ FFS FGTNDLTQM 

Sbjct: 720 LRYLKKIIKETADALIKFAGvEL"YKIGTMIEVPRAAVTAHQIAEEAEFFSFGTNDLTQM 779 

Query: 773 SFGFSRDDAGKFLGEYVDKGLLKKDPFQVXDQKGIGRFIGQAVRLGKEVKPNLKIGICGE 832 

+FGFSRDD GKFL EY++KG+L+ DPF+ LD G+G + G+ +P+LK+G+CGE 

Sbjct: 780 TFGFSRDDVGKFLPEYLEKGILEHDPFFOIBVDCTGELVRMGKEKGRSTRPDLKVGVCGE 839 

Query: 833 HGGEPSSIEFCYQLGLHYVSCSPFRIPIAKLAAAQAKIKQSR 874 

HGG+P SI F ++GL YVSCSP+R+P+A+LAAAQA +K + 
Sbjct: 840 HGGDPRSILFFDKIGLDYVSCSPYRVPV7ARLAA&QAALKNKK 881 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1793 

A DNA sequence (GBSxl900) was identified in S.agalactiae <SEQ ID 5573> which encodes the amino 
5 acid sequence <SEQ ID 5574>. This protein is predicted to be glutamyl-tRNA (Gin) amidotransferase 
subunit C (gatC). Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 .3229 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04384 GB-.AP001509 glutamyl-tRNA (Gin) amidotransferase 
subunit C [Bacillus halodurans] 
Identities = 42/94 (44%) , Positives = 63/94 (66%) 

20 Query: 2 KISEEEVRHVANLSKLRFSDQETKEFASSLSKIVDMIELLNEVDTEGVPVTTTMADRICTV 61 

+ IS E+V+HVA+L++L +++E K F L 1+ E LNE+DTEGV T+ + D K V 
Sbjct: 3 RI SMEQVKHVAHLARLAITEEEAKLFTEQLGDI IQFAEQLNELDTEGVEPTSHVLDMKNV 62 

Query: 62 MREDIAQPGHNRDDLFKNVPQHQDYYIKVPAILE 95 
25 +RED + G +D+ KN P H+D I+VP++LE 

Sbjct: 63 LREDKPEKGLPVEDVLKNAPDHEDGQIRVPSVLE 96 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5575> which encodes the amino acid 
sequence <SEQ ID 5576>. Analysis of this protein sequence reveals the following: 

30 Possible site: 60 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3247 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 72/100 (72%) , Positives = 88/100 (88%) 

40 

Query: 1 MKISEEEVRHVANLSKLRFSDQETKEFASSLSKIVDMIELLNEVDTEGVPVTTTMADRKT 60 

MKI SEEEVRHVA LSKL FS+ ET FA++LSKIVDM+ELLNEVDTEGV +TTTMAD+K 
Sbjct: 5 MKISEEEVRHVAKLSKLSFSESETTTFATTLSKIVDMTOLIjNEVDTEGVAITTTMADKKN 64 

45 Query: 61 VMREDIAQPGHNRDDLFKNVPQHQDYYIKVPAILEDGGDA 100 

VMR+D+A+ G +R LFKNVP+ ++++IKVPAIL+DGGDA 
Sbjct: 65 VMRQDVAEEGTDRALLFKNVPEKENHFI KVPAILDDGGDA 104 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
50 vaccines or diagnostics. 

Example 1794 

A DNA sequence (GBSxl901) was identified in S.agalactiae <SEQ ID 5577> which encodes the amino 
acid sequence <SEQ ID 5578>. Analysis of this protein sequence reveals the following: 
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Possible site: 30 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.64 Transmembrane 7 - 23 ( 6-24) 

Pinal Results 

bacterial membrane Certainty=0. 4057 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1795 

A DNA sequence (GBSxl902) was identified in S.agalactiae <SEQ ID 5579> which encodes the amino 
acid sequence <SEQ ID 5580>. This protein is predicted to be glutamyl-tRNA amidotransferase, subunit A 
(gatA). Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2855 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04385 GB:AP001509 glutamyl-tRNA (Gin) amidotransferase 
subunit A [Bacillus halodurans] 
Identities = 285/486 (58%) , Positives = 367/486 (74%) , Gaps = 4/486 (0%) 

MSFNNQSIDQLHDFLVKKEISATELTKATLEDIHAREQAVGSFITISDEMAIAQAKEID- 59 
MS + + +H L +KEIS ++L +1 + V +F+ +++E A A AKE+D 

MSIiFDLKLKDVHTKLHEKEISVSDLvDEAYKRIEQVDGQVEAFLALNEEKARAYAKELDA SO 





1 


Sbjct: 


1 




60 


Sbj Ct: 


61 




118 


Sbjct: 


120 


Query: 


178 


Sbjct: 


130 


Query: 


238 


Sbjct: 


240 


Query: 


298 


Sbjct: 






358 


Sbjct: 


360 


Query: 


418 



K NMDEFAMG STE S F+KT N W+ VPGGSSGGSAAAVA+G+V +LGSDTGGSI 



RQPA++ G+VG+KPTYGRVSR+GL AF SSLDQIGP+++ V++NA LL ISGHD DST 



S+ V D+ + + DI+G+KIA+PKEYLGEG+ + VK++++ A K LE LGA EEVSL 



MLGTF+LSSGYYDAYYKKA QVR+LI QDFEKVF YD+I+GPT PT AF + DP+ 
) MLGTFALSSGYYDAYYKKAQQVRTLIKQDFEKVFEQYDVIIGPTTPTPAFKIGEKTDDPL 419 

Query: 418 AiWLADILTIPVNLAGLPGISIPAGFDQGLPVCMQLIGPKFSEETIYQVAAAFEATTDYH 477 
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Query: 478 KQQPKI 483 
++P + 

Sbjct: 480 TKRPTL 485 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5581> which encodes the amino acid 
sequence <SEQ ID 5582>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2364 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 392/487 (80%), Positives = 442/487 (90%) 

Query: 1 MSFNNQSIDQLHDFLVKKEISATELTKATLEDIHAREQAVGSFITISDEMAIAQAKEIDD 60 

MSFN++4I++LHD LV KEISATELT+ATLEDI +RE+AVGSFIT+S+E+A+ QA ID 
Sbjct: 1 MS FNHKT IEELHDLL VAKE I SATELTQATLEDI KSREEAVGSFITVSEE VALKQAAAIDA 60 

Query: 61 KGIDADNVMSGIPIiAVKDNISTKGILTTAASKMLYNYEPIFDATAVEKLYAKDMIVIGKA 120 

KGIDADN+MSGIPIAVKDNISTK ILTTAASKMLYNYEPIF4AT+V YAKDMIVIGK 
Sbjct: 61 KGIDADNLMSGIPLAVKDNISTKEILTTAASKMLYNYEPIFNATSVANAYAKDMIVIGKT 120 

Query: 121 NMDEFAMGGSTETSYFKKTNNAWDHSKVPGGSSGGSAAAVASGQVRLSLGSDTGGSIRQP 180 

NMDEFAMGGSTETSYFKKT NAWDH+KVPGGSSGGSA AVASGQVRLSLGSDTGGSIRQP 
Sbjct: 121 NMDEFAMGGSTETSYFKKTKNAVJDHTKVPGC-SSGGSATAVASGQVRLSLGSDTGGSIRQP 180 

Query: 181 ASFNGIVGMKPTYGRVSRFGLFAFGSSLDQIGPMSQTVKENAQLLTVISGHDVRDSTSSE 240 

A+FN +VG+KPTYG VSR+GL AFGSSLDQIGP + TVKENAQLL VI+ DV+D+TS+ 
Sbjct: 181 AAFNSWGLKPTYGWSRYGLIAFGSSLDQIGPFAPTVKENAQLLNVIASSDVKDATSAP 240 

Query: 241 RTVGDFTAKIGQDIQGMKIALPKEYLGEGIAQGVKETIIKAAKHLEKLGAVIEEVSLPHS 300 

+ D+T+KIG+DI+GMKIALPKEYLGEGI +KET++ + K E LGA +EEVSLPHS 
Sbjct: 241 WIADYTSKIGRDIKGMKIALPKEYLGEGIDPEIKETVLASVKQFEALGATVEEVSLPHS 3 00 

Query: 301 K^GVAVYYIVASSE^SSNLQRFDGIRYGYRTENYKNLDDIYVNTRSEGFGDEVI<RRIMLG 360 

KYGVAVYYI +ASSEASSNLQRFDGIRYG+R ++ KNLD+IYVNTRS+GFGDEVKRRIMLG 
Sbjct: 3 01 KYGVAVYYI IASSFJ^SNLQRFDGIRYGFRADDAKNLDEIYVNTRSQGFGDEVKRRIMLG 3 60 

Query: 361 TFSLSSGYYDAYYKKAGQVRSBIIQDFEKVFADYDLILGPTAPTTAFDLDSLNHDPVAMY 420 

TFSLSSGYYDAY+KKAGQVR+LIIQDF+KVFADYDLILGPT PT AF LD+LNHDPVAMY 
Sbjct: 361 TFSLS SGYYDAYFKKAGQVRTIi 1 1 QDFDKVFADYDLI LGPTTPTVAFGLDTLNHDPVAMY 420 

Query: 421 LADILTIPWDAGLPGISIPAGFDQGLPVGMQLIGPKFSEETIYQVAAAFEATTDYHKQQ 480 

LAD+LTIPVNLAGLPGISIPAGF GLPVG+QLIGPK++EETIYQ AAAFEA TDYHKQQ 
Sbjct: 421 LADLLT I PVNLAGLPGI S I PAGFVDGLPVGLQLIGPKYAEETI YQAAAAFEAVTDYHKQQ 480 

Query: 481 PKIFGGE 487 

P IFGG+ 
Sbjct: 481 PIIFGGD 487 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-2031- 

Example 1796 

A DNA sequence (GBSxl903) was identified in S.agalactiae <SEQ ID 5583> which encodes the amino 
acid sequence <SEQ ID 5584>. This protein is predicted to be glutamyl-tRNAGln amidotransferase subunit 
B (gatB). Analysis of this protein sequence reveals the following: 

a N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3935 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10095> which encodes amino acid sequence <SEQ ID 
10096> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

iGP:BAB04386 GH:AP001509 glutamyl -tRNA (Gin) amidotransferase 
subunit B [Bacillus halodurans] 
Identities = 308/476 (64%) , Positives = 361/476 (75%) , Gaps = 1/476 (0%) 

20 Query: 1 MNFETVIGLEvHvEtmNSKIFSPSSTAHFGQEQNANTWIDWSFPGVLPVMNKGVIDAGI 60 

Sbjct: 1 

Query: 61 KATALAINMDIHQNMHFDRKNYFYPDNPKAYQISQFDEPIGYNGWIEIELEDGTRKK1RIE 120 

25 KAA+ALN ++ + FDRKNYFYPDNPKAYQISQFD+PIG NGWIEIE+ DGT+KK1 I 

Sbjct: 61 KAAMALNCEVATDTKFDRKNYFYPDNPKAYQISQFDKPIGENGWIEIEV-DGTKKKIGIT 119 

Query: 121 RAHLEEDAGICNTIIGTDGYSYVDLNRQGVPL1EIVSEADMRSPEEAYAYLTALKEIIQYTG 180 
R HLEEDAGK TH +GYS VD NRQG PHEIVSE D+R+P+EAYAYL LK IIQYTG 
30 Sbjct: 120 RLHLEEDAGKLTHSGNGYSLVDFNRQGTPLIEIVSEPDIRTPQEAYAYLEKLKSIIQYTG 179 

Query: 181 ISDVKMEEGSMRVDANISLRPYGQEEFGTKAELKNL1<ISFNNVRKGLIHEEKRQAQVLRSG 240 

+SD KMEEGS+R DANISLRP GQEEFGTK ELKNLNSFN VRKGL +EEKRQAQVL SG 
Sbjct: 180 VSDCKMEEGSLRCDANISLRPVGQEEFGTKTELKNLNSFNFVRKGLEYEEKRQAQVLLSG 239 

35 

Query: 241 GQIQQETRRFDETTGETILMRVKEGSSDYRYFPEPDLPLFDISDEWIDQVRLELPEFPQE 300 

G+I QETRR+DE 4T+LMRVKEGS DYRYFPEPDL I DEW ++R E+PE P 
Sbjct: 240 GEILQETRRYDFJ^KTVLMRVKEGSDDYRYFPEPDLVALHIDDEMKARIRSEIPELPnA 299 

40 Query: 301 RRAKYVSSFGLSSYDASQLTATKRTSDFFEKAVAIGGDAKQVSNWLQGEVAQFIiNSESKS 360 

R+- +YV GL +YDA LT TK SDFFE+ +A G D K SNWL GEV+ +LN+E K 
Sbjct: 300 RKKRYvEELGLPAYDAMvIjTLTKEMSDFFEETIAKGADPKIjASNWLMGEVSGYLNAEQKE 359 

Query: 361 lEEIGLTPENLVEMIGLIADGTlSSKlAKKVFVHIAKNGGSAEEFVKKAGLVQISDPEVL 420 
45 ++E+ LTP+ L +MI LI GTISSKIAKKVF L + GG EE VK GLVQISD L 

Sbjct: 360 LDEVALTPDGLAKMIQLIEKGTISSKIAKKVFKDLIEKGGDPEEIVKAKGLVQISDEGEL 419 

Query: 421 IPIIHQVFADNEAAVIDFKSGKRNADKAFTGYLMKATKGQANPQVALKLLAQEIAK 476 
+ +V +N+ ++ DFK+GK A G +MKATKG+ANP + KLL +E+ K 

50 Sbjct: 420 RKYVVEVljDNNQQSIDDFKNGKDRAIGFLVGQIMKATKGKANPPMVNKLLIjEEINK 475 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5585> which encodes the amino acid 
sequence <SEQ ID 5586>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 {Not Clear) <: suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 410/479 (85%) , Positives = 447/479 (92%) 

Query: 1 mFEWIGLEVHWUOTSKIFSPSSAHFGQEQNANTNVIDWSFPGVLPA/MNKGVIDAGI 60 

MNFET+IGLEVHVELNTNSKIFSPSSAHFG++ NANTNVIDWSFPGVLPVMNKGVIDAGI 
Sbjct: 1 MNFETIIGLEVHVELIMSKIFSPSSAHFGEDPNAOTWIDWSFPGVLPVMNKGVIDAGI 60 

Query: 61 KAAIAIMDIHQNMHFDRKNYFYPDNPKAYQISQFDEPIGXNGf7IEIELEDGTRKKIRIE 120 

KAALALNMDIH+ MHFDRKNYFYPDNPKAYQISQFDEPIGYNGWI + I+LEDG+ KK1RIE 
Sbjct: 61 KAALAMMDIHKEMHFDRKNYFYPDNPKAYQISQFDEPIGYNGWIDIKLEDGSTKKIRIE 120 

Query: 121 RAHLEEDAGKNTHGTDGYSYVDLNRQGVPLIEIVSEADMRSPEEAYAYLTALKEIIQYTG 180 

RAHLEEDAGKNTHGTDGYSYVDLNRQGVPLIEIVSSADMRSPEEAYAYLTALKEIIQYTG 
Sbjct: 121 RAHLEEDAGKNTHGTDGYSYVDLNRQGVPLIEIVSEADMRSPEEAYAYLTALKEIIQYTG 180 

Query: 181 ISDVKMEEGSMRVDANISLRPYGQEEFGTKAELKNLNSFNNVRKGLIHEEKRQAQVLRSG 240 

ISDVKMEEGSMRVDANISLRPYGQE+FGTK ELKNLNSF+NWKGL E +RQA++LRSG 
Sbjct: 181 ISDVKMEEGSMRVDANISLRPYGQEQFGTKTELKNLNSFSNVRKGLEFEVERQAKLLRSG 240 

Query: 241 GQIQQETRRFDETTGETILMRVKSGSSDYRYFPEPDLPLFDISDEWIDQVRLELPEFPQE 300 

G I+QETRR+DE TILMRVKEG++DYRYFPEPDLPL++I D WID+-I-R +LP+FP + 
Sbjct: 241 GVIRQETRRYDEANKGTILMRVKEGAADYRYFPEPDLPLYEIDnAWIDEMRAQLPQFPAQ 300 

Query: 301 RRAKYVSSFGLSSYDASQLTATKATSDFFEKAVAIGGDAKQVSNWLQGEVAQFLNSESKS 360 

RRAKY GLS+YDASQLTATK SDFFE AV++GGDAKQVSNWLQGEVAQFLN+E K+ 
Sbjct: 301 RRAKYEEELGLSAYDASQLTATKVLSDFFETAVSLGGDAKQVSNWLQGEVAQFLNAEGKT 360 

Query: 361 IEEIGLTPENLVEMIGLIADGTISSKIAKKVFVHLAKNGGSAEEFVKKAGLVQISDPEVL 420 

IEEt LTPENLVEMI +IADGTISSK+AKKVFVHLAKNGGSA +V+ KAGLVQI SDP VL 
Sbjct: 361 IEEIALTPENLVEMIAIIADGTISSKMAKKVFVHLAKNGGSARAYVEKAGLVQISDPAVli 420 

Query: 421 IPIIHQVFADNFJ^VIDFKSGKRNADKAFTGYIMKAITCGQANPQVALKLLAQELAKLKE 479 

+PIIHQVFADNEAAV DFKSGKRNADKAFTG+LMKATKGQANPQVA +LLAQEL KL+ + 
Sbjct: 421 VPIIHQVFADNEAAVADFKSGKRNADKAFTGFLMKATKGQANPQVAQQLIAQEI.QKLRD 479 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1797 

A DNA sequence (GBSxl904) was identified in S.agalactiae <SEQ ID 5587> which encodes the amino 
acid sequence <SEQ ID 5588>. Analysis of this protein sequence reveals the following: 



i uncleavable N-term signal seq 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



Transmembrane 108 - 124 ( 105 - 
Transmembrane 278 - 294 ( 268 - 
Transmembrane 191 - 207 ( 188 - 208) 



Transmembrane 219 - 235 ( 215 - 

1 - 57 ( 39 - 

2 - 148 ( 131 - 150) 
4 - 270 ( 253 - 

03 Transmembrane 79 - 95 ( 79 - 



• Final Results 

bacterial membrane Certainty=0 .3909 (Affirmative! 

bacterial outside Certainty=0. 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) • 



A related GBS nucleic acid sequence <SEQ ID 10093> which encodes amino acid sequence <SEQ ID 
10094> was also identified. 

60 The protein has homology with the following sequences in the GENPEPT database. 
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TKKEKGTMMTLAAGIAWGISGISGQYLMSH-GVHVNLLTSLRIiLITGIFLLSLARSKQKE 64 
+++ G ++ + WG+SG QYL H + L +R+L++G+ LL++A SKQ+ 

SRRAWGLLLVI IGATMWGVSGTVAQYLFQHK3 FNAEWLWVRMLVSGLLLIAIA- SKQR- 58 



E+I++ + I GT+ +AT G 



+W S V+G GM IGG FS + W + ++4S+ A + +1 GT+ A+ +L+ H 



f A + +LAS EP+S+ L+VL L 



vaccines or diagnostics. 
Example 1798 

A DNA sequence (GBSxl905) was identified in S.agalactiae <SEQ ID 5589> which encodes the amino 
acid sequence <SEQ ID 5590>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2103 (Affirmative) < suco 

35 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10091> which encodes amino acid sequence <SEQ ID 
10092> was also identified. 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14510 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 52/153 (33%) , Positives = 88/153 (56%) 

Query: 17 YRPTFWEAVYDLRAEDLLRHGIRAVLVDLDNTLIAWNNPDGTAEVRAWLDEMTTADISV 76 
45 + P V+ ++ + E L ++ ++ DLDNTL+ W+ P+ T + W +EM I V 

Sbjct: 6 FLPDEFVKNIFHITPEKLKERNVKGIITDLDNT'LVEWDRPNATPRLIEWFEEMKEHGIKV 65 

Query: 77 VWSNNTfflARVERAVSRFGVDFVSRAMKPFTRGINMAIERYGFDRDEVIMVGDQLMTDIR 136 
4-VSNNN RV+ G+ F4 +A KP + N A+ +++ +++GDQL+TD+ 

50 Sbjct: 66 TIVSNNNERRVKLFSEPLGIPFIYKARKP^KAFNRAVRN^LKKEDCWIGDQLLTDV^ 125 

Query: 137 ASHRAGIKSVLVKPIVKSDAWHTKFNRI.RERRV 169 

+R G ++LV P+ SD + T+FNR ERR+ 
Sbjct: 126 GGNRNGYHTILWPVASSDGFITRFNROVERRI 158 

55 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5591> which encodes the amino acid 
sequence <SEQ ID 5592>. Analysis of this protein sequence reveals the following: 
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3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4252 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 147/175 (84%) , Positives = 158/175 (90%) 

Query: 12 LSIDDYRPTFVVEAVYDLRAEDLLRHGIRAVLVDLDNTLIAWNNPDGTAEVRAWLDEMTT 71 

+SIDDYRPT++VEA+YDLRA DLLRHGI AVLVDLDNTMAWNNPDGT EWAWLDEMT 
Sbjct: 20 MSIDDYRPTWWEAIYDLPJ^IDLLRHGITAVLVDLDNTLIAVMPDGTPEvRAWLDEMTI 79 

Query: 72 ADISVvWSNNNHARVERAVSRFGTOFVSRAKKPFTRGINMAIERYGFDRDEVIMVGDQL 131 

ADISWWSNN H+RVERAVSRFGVDF+SRA+KPF GI AI RYGFDR+EVIMVGDQL 
Sbjct: 80 ADISVWVSNNKHSRVERAVSRFGVDFISRALKPFAYGIEKAIARYGFDRNEVI^WGDQL 139 

Query: 132 MTDIRASHRAGIKSVLWPIVKSDAMiJTKFNRLRERRWKKIEENYGKIVYQKGI 186 

MTDIRASHRAGIKSVLVKP+V SDAWNTK NR RERRV K+EE YGK+ YQKSI 
Sbjct: 140 MTDIRASHRaGIKSVLVKPLVASDAWNTKINRVJRERRVMAKLEEKYGKLSYQKGI 194 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1799 

A DNA sequence (GBSxl906) was identified in S.agalactiae <SEQ ID 5593> which encodes the amino 
acid sequence <SEQ ID 5594>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1091 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Oertainty=0 . Q0O0 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MEELFCIGCGARIQTENKDAAGYTPRAALEKGLETGELYCQRCFRLRHYNEITDVHITDD 6 0 
ME4-+ CIGCG IQTE+K GX P A+L K + CQRCFRL++YNEI DV +TDD 

MEKWCIGCGVTIQTEDKTGLGYAPPASLTKE NVICQRCFRLKNYNEIQDVSLTDD 5 6 



- QW+ A E GL+PVDV L SA I+++ID IE YR+G+DVYWG TNVGKST I 



NAIIREITGSRDVITTSRFPGTTIjDKIEIPLDDGSYIFDTPGIIHRHQMAHYLTAKNLKY 240 
N II+E++G D+ITTS+FPGTTLD IEIPLDDGS ++DTPGII+ HQMAHY+ K+LK 
NRIIKEVSGEEDI ITTSQFPGTTLDAIEIPIiDDGSSLYDTPGI INNHQMAHYVNKKDLKI 236 



Query: 


1 


Sbjct: 


1 


Query: 




Sbjct: 


57 


Query: 


121 


Sbjct: 


117 




181 


Sbjct: 






241 


Sbjct: 


23 7 


Query: 


301 



WO 02/34771 



PCT/GB01/04789 



-2035- 

KH G+LLTPP E+ +FP+LV H FTIKD K DIV+SGLGW+ V + V A+A 
Sbjct: 297 EHffiGELLTPPGKDEMDEFPELVAHTFTIKDKKTDIWSGI£WVTVHDADKK---VTAYA 353 

Query: 360 PEGVAWLRKALI 372 
5 P+GV V +R++LI 

Sbjct: 354 PKGVHVFVRRSLI 366 

A related DNA sequence was identified in S.pyogems <SEQ ID 5595> which encodes the amino acid 
sequence <SEQ ID 5596>. Analysis of this protein sequence reveals the following: 

10 Possible site: 15 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB14509 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 220/373 (58%) , Positives = 286/373 (75%) , Gaps = 8/373 (2%) 

Query: 1 MEELFCIGCGIQIQTEDKEKAGFTPAAALKKGMETGELYCQRCFRLRHYNEITDVHITDD 60 

ME++ CIGCG+ IQTEDK G+ P A+L K + CQRCFRL++YNEI DV +TDD 

Sbjct: 1 MEKWCIGCGVTIQTEDKTGLGYAPPASLTKE NVICQRCFRLKNYNEIQDVSLTDD 56 



Query: 121 VTQWLTEE^EEGLRPLDVMLTSAQNICYAIKDLIGRINELRNGRDVYVVGVIWGKSTLI 180 

+ QW4 A E GL+P+DV L SA I RNG+DVYWG TNVGKST I 

Sbjct: 117 LIQWMKREAKELGLKPVDVFLVSAGRGQGIREVIDAIEHYRNGKDVY\rVGCTNVGKSTFI 176 

Query: 181 NAIIQEITGNKDVITTSRFPGTTLDKIEIPLDDGTFIFDTPGIIHRHQMAHYLSPKELKI 240 

N II+E4+G +D+ITTS+FPGTTLD IEIPLDDG+ ++DTPGII+ HQMAHY++ K+LKI 
Sbjct: 177 NRIIKEVSGEEDIITTSQFPGTTLDAIEIPLDDGSSLYDTPGIINNHQMAHYVNKKDLKI 236 

Query: 241 VSPKKElKPKTYQLNPEQTLFLGGIiARFDFINGERQGFTAFFDNQLELHRTKLAGADAFY 300 

+SPKKE+KP+T+QLN +QTL+ GGLARFD+++GER F + N+L +HRTKL ADA Y 
Sbjct: 237 LSPKKELKPRTFQLNDQQTLYFGGIARFDYVSGERSPFICYMPNELMIHRTKLENADALY 296 

Query: 301 DKHVGTLLTPPDKKELTAFPKLWHEFTI-DQKJVIDIVFSGLGWIRVNGQKDSKAIVAAWA 359 

+KH G LLTPP K E+ FP+LV H FTI D+K DIVFSGLGW+ V+ D+ V A+A 
Sbjct: 297 EKHAGELLTPPGKDEMDEFPELVAHTFTIKDKKTDIVFSGIX3WVTVH DADKKVTAYA 353 

Query: 360 PEGVAVIVRKAII 372 

P+GV V VR+++I 
Sbjct: 354 PKGVHVFVRRSLI 366 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 308/372 (82%), Positives = 343/372 (91%) 

Query: 1 MEELFCIGCGARIQTENKDAAGYTPRAALEKGLETGSLYCQRCFRLRHYNEITDVHITDD 60 

MEELFCIGCG +IQTE+K+ AG+T? AAL+KGtETGELYCQRCFRLRHYNEITDVHITDD 
Sbjct: 1 ' MEELFCIGCGIQIQTEDKEKAGFTPAAALKKGMETGELYCQRCFRLRHYNEITDVHITDD 60 

Query: 61 EFLKLLHEVGDSDALVVWIDIFDFNGSIIPGLSRFVAGNDVLLVGNKKDILPKSVKDGK 120 

EFL+LLHEVGDSDALVVNVIDIFDFNGSIIPGLSRF++G^VLLVGNKKDILPKSVKDGK 
Sbjct: 61 EFLRLLHEVGDSDALVVNVIDIFDFNGSIIPGLSRFISGNDVLLVGNKKDILPKSVKDGK 120 

Query: 121 VTQWIiTERAHEEGLRPVDVILTSAQNHHAIKDLIDTIEKYRHGQDVYVVGVTNVGKSTLI 180 

VTQWLTERAHEEGLRP+DV+LTSAQN +AIKDLI I + R+G+DVYWGVTNVGKSTLI 
Sbjct: 121 VTQWLTERAHEEGLRPLDVMLTSAQNKYAIKDLIGRINELRNGRDVYWGVTNVGKSTLI 180 
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361 



VSPKKEIKPKTYQLN EQTLFL GIARFDFI+G++QGFTA+FDN L LHRTKL GAD FY 



EGVAV++RKA+I 
EGVAVI VRKAI I 372 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1800 

A DNA sequence (GBSxl907) was identified in S.agalactiae <SEQ ID 5597> which encodes the amino 
acid sequence <SEQ ID 5598>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2948 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14507 GB:Z99117 similar to dihydrodipicolinate reductase 
[Bacillus subtilis] 
Identities = 49/97 (50%), Positives = 67/97 (68%), Gaps = 2/97 (2%) 

Query: 1 MLTSKQRAFLKSEAHSMKPIIQIGKNGIjNDQIKTSVRNALDARELIKVTLLQNTDEDIHD 60 

MLT KQ+ FL+S+AH + PI Q+GK G+ND + + AL+ARELIKV++LQN +ED +D 
Sbjct: 1 MLTGKQKRFLRSKAHHLTPIFQVGKGGVNDKMIKQIAEALEARKLIKVSVLQNCEEDKND 60 

Query: 61 VAEVLEDEIGCDTVLKIGRILILYKESARKENRKISV 97 

VAE L V IG ++LYKES KEN++I + 

Sbjct: 61 VAEALVKGSRSQLVQTIGNTIVLYKES - - KENKQIEL 95 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5599> which encodes the amino acid 
sequence <SEQ ID 5600>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 2839 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 89/102 (87%) , Positives = 98/102 (95%) 



Query: 1 MLTSKQRAFLKSEAHSMKPIIQIGKNGI^QIKTSVRNALDARELIKvTLLQNTDEDIHD 60 

MLTSKQRAFLKSEAHS+KPI+QIGKNGLND IKTS+R ALDARELIKVTLLQNTDEDIH+ 
Sbjct: 1 MLTSKQRAFLKSFAHSLKPIVQIGKNGLNDHIKTSIRQALI^ELIKvTLLQNTDEDIHE 60 
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Query: 51 VAEVLEDEIGCDTVLKI 

VAE+LE+EIGCDTVLKIGRIIiIIiYK SA+KENRK+S KVKA+ 
Sbjct: 51 VAEILEEEIGCDTVLKIGRILIDYKVSAKKENRKLSPKVKAI 102 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1801 

A DNA sequence (GBSxl908) was identified in S.agalactiae <SEQ ID 5601> which encodes the amino 
acid sequence <SEQ ID 5602>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = - 



Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Mot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10089> which encodes amino acid sequence <SEQ ID 
1009O was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 38 KQIGIMGGNENPVHNAHLWADQVRQQLCLDQVLLMPEFQPPHIDKKETIDEQHRLKMLE 97 

K+IGI GG F+P HN HL++A++V Q LD++ MP PPH ++ D HR++ML+ 
Sbjct: 2 KK1GIFGGTFDPPHNGHLLMANEVLYQAGLDEIWFMPNQIPPHKQNEDYTDSFHRVEMLK 61 

Query: 98 LA1EGIDGLSIEPIEIERKGISYTYDTMKLLIEKNPDVDYYFIIGADMVEYLPKWHRIDE 157 

LAI+ +E +E+ER+G SYT+DT+ LL ++ P+ +FIIGADM+EYLPKW+++DE 

Sbjct: 62 LAIQSNPSFKLELVEMEREGPSYTFDTVSLLKQRYPNDQLFFIIGADMIEYLPKWYKLDE 121 

Query: 158 L VKMVQFVGVQRPKYKAGTSYPVIWVDLPLMD I S S SMIRQF I KSNRQPNYLLPRE VLDYI 217 

L+ ++QF+GV+RP + T YP+++ D+P ++SS+MIR+ KS + +YL+P +V Y+ 
Sbjct: 122 LrJjmiQFIGVKRPGFHVErPYPLLFADVPEFEVSSraiRERFKSKKPTDYLIPDKVKKYV 181 

Query: 218 RKEGLYK 224 

, + GLY+ 
Sbjct: 182 EENGLYE 188 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5603> which encodes the amino acid 
sequence <SEQ ID 5604>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4660 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/210 (81%) , Positives = 196/210 (92%) 

Query: 15 MALELLTPFTKVELEEKKRDTNRKQIGIMGGNENPVHNAHLWADQVRQQLCLDQVLLMP 74 

MALELLTPFTKVELEE+K+++NRKQ1GI+GGNFNP+HSIAHLWADQVRQQL LDQVLLMP 
Sbjct: 1 ^ELLTPFTKVELEEEKICESISKKQIGILGGNFNPIHNAHLVVADQVRQQLGLDQVLLMP 60 
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Query: 75 EFQPPHIDKKETIDEQHRLKMLELAIEGIDGLSIEPIEIERKGISYTYDTMKLLIEKNPD 134 

E +PPH+D KETIDE+HRL+MLELAIE ++GL+IE E+ER+GISYTYDTM L E++PD 
Sbjct: 61 ECKPPHVDAKETIDEKHRLRMLEriAIEDVEGIAIETCEIiERQGISYTYDTMLYLTEQHPD 120 

Query: 135 VDYYFIIGADMVEYLPKWHRIDELVKMVQFVGVQRPKYKAGTSYPVIWVDLPLMDISSSM 194 

VD+YFIIGADMV+YLPKWHRIDELVK+VQFVGVQRPKYKAGTSYPVIWVDLPL+DISSSM 
Sbjct: 121 VDFYFIIGADMVDYLPKWHRIDELVKLVQFVGVQRPKYKAGTSYPVIWVDLPLIDISSSM 180 

Query: 195 IRQFIKSNRQPNYLLPREVLDYIRKEGLYK 224 

IR FIK RQPNYLLP+ VLDYI +EGLY+ 
Sbjct: 181 IRDFIKKGRQPNYLLPKRVLDYITQEGLYQ 210 

SEQ ID 5602 (GBS651) was expressed iaE. coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 132 (lane 8-10; MW 53.3kDa) and in Figure 186 (lane 8; MW 53kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
132 (lane 12; MW 28.4kDa) and in Figure 140 (lane 11; MW 20kDa). 

Purified GBS651-GST is shown in Figure 243, lane 4; purified GBS651-His is shown in Fig.229, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1802 

A DNA sequence (GBSxl909) was identified in S.agalactiae <SEQ ID 5605> which encodes the amino 
acid sequence <SEQ ID 5606>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

, Final Results 

bacterial cytoplasm Certainty=0 .4281 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14505 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 79/180 (43%) , Positives = 115/180 (63%) 



Query: 69 FLRLIDKYQPDPDLKKWGNNIWHGLVGIYKIQEDLAIKDQDILAAIAKHTVGSAQMSTLD 128 

++I + + L +WH VG Y +Q + ++D+DIL AI HT G M+ L+ 

Sbjct: 61 MKQIIAREKMPAHI.LDHNPELWHAPVGAYLVQREAGVQDEDILDAIRYHTSGRPGMTLLE 120 

Query: 129 KIVTVADYIEHISIRDFPGWEARELAKVDLNKAVAYETARTVAFIASKAQPIYPKTIETYN 188 

K++YVADYIE NR FPGV+E R+LA+ DUSI+A+ T+ FL K QP++P T TYN 

Sbjct: 121 KVIYVADYIEPM^PGVDEVWaaVKTOINQ^ 180 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5607> which encodes the amino acid 
sequence <SEQ ID 5608>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2615 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/194 (67%) , Positives = 159/194 (81%) 

Query: 1 MTYKDYTGLDRTELLSKTOH^SDKRFNHVIGVERAAIEIjAERYGYDKE 60 

MTY+DY RTELL+K+ MS KRF HVLGVE+AA+ LAE YG + +KAGLAALLHDY 
Sbjct: 1 MTYEDYLPYSRTELMKIAEQMSPKRFKflVlGWKAALSIAECYGCNPDKRGLAALLHDY 60 

Query: 61 AKELSDDEFLRLIDKTQPDPDLmVG^ItVHGLVGIYKIQEDLAIKDQDILAAIflKHTVG 120 

AKE D FL LIDKYQ P+L KW NN+WHG+VGIYKIQEDL +KD+DIL AI HTVG 
Sbjct: 61 AKECPDQVFLDLIDKYQLSPELAKWNNNVWHGMVGIYKIQEDLGLKDKDILRAIEIHTVG 120 

Query: 121 SAQMSTLDKIVWADYIEHNRDFPGVEE^REIjAKVDLNKAVAYETARTVAFLiASKAQPIY 180 

+A+M+ LDK++YVADYIE R FP V++AR++AK+DM4-AVAYET TVA+LASKAQPI+ 
Sbjct: 121 AAEMTLLDKVLWADYIEEGRIFPLVDDARKIAI<1DI^QAVAYETVHTVAYLASKAQPIF 180 

Query: 181 PKTIETYNAYIPYL 194 

P+T++TYNA+ YL 
Sbjct: 181 PQTLDTYNAFCSYL 194 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1803 

A DNA sequence (GBSxl910) was identified in S.agalactiae <SEQ ID 5609> which encodes the amino 
25 acid sequence <SEQ ID 561 0>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have an uncleavable N-tertn signal seq 

INTEGRAL Likelihood = -2.34 Transmembrane 12 - 28 ( 10 - 28) 

30 ----- Final Results 

bacterial membrane Certainty=0. 1935 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 A related GBS nucleic acid sequence <SEQ ID 100S7> which encodes amino acid sequence <SEQ ID 
10088> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 22 ALLLIDIQQGIMDKK- -PKHLTNFAVLLDDLDLSAKGSNCEVIWIRHHDKE LPQGS 75 

AL+L+D QQG D ++ + ++LL + + + + +RH+ E L QG 

Sbjct: 7 ALVLVDFQQGFADEAWGDRNNPDAEAHAEELIiAAWRDAAAPIAHVRHNSTEATSPLRQGE 56 

Query: 76 PQWEIWEQRHLVTHHKIIDKTYNSCFKDTELHDYLQSKHISQLIMMGLQTEYCFDTSVKV 135 

P + + K+ N F DT L +L+ + L++ GL T++C T+V++ 

Sbjct: 67 PGFAYTDG^PAADEPEFVKSVNGAFTOTALEGWLRDRDTGSLWCGLTTDHCVSTTVRM 126 

Query: 136 AFEYGYDIFIPQGGHLTFDTPTLSGDSIKK HYENIWHHR--FATMUAKDSLL 185 

A G+D+ + + TDTLG++ H +HR FAT+ ++L 

Sbjct: 127 ADNRGFDVTLVRDATATHDR-TLDGERLPPS\^RTALlAHLRGEFATLATTATVL 180 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 5610 (GBS652) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 133 (lane 2 & 3; MW 49.7kDa) + lane 4; MW 27kDa) and in Figure 186 (lane 9; 
MW 50kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
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extract is shown in Figure 133 (lane 5 & 7; MW 24.8kDa) and in Figure 178 (lane 10; MW 25kDa). 
Purified GBS652-GST is shown in Figure 243, lane 9; purified GBS652-His is shown in Figure 229, lane 
10. 

Based on this analysis, it was predicted that this protein and its epitopes, could be usefol antigens for 
vaccines or diagnostics. 

Example 1804 

A DNA sequence (GBSxl911) was identified in S.agalactiae <SEQ ID 561 1> which encodes the amino 
acid sequence <SEQ ID 5612>. Analysis of this protein sequence reveals the following: 

5 N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 0945 (Affirmative) < succ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MTEKDLLQLWKaADEKRAEDIVILDLQPVTSVADYFVIMSASNSRQLEAIADNIREQVK SD 

M +K +L++ A D+KRAEDI+ LD-M- ++ VADYF+I ++ +Q++AIA I++Q 
Sbjct: 1 mvIQKSlLKIAAAACDDKRAEDILALDMEGISLVADYFLICHGNSDKQVQAIAREIKDQAD 60 

Query: 61 GNGGDASHLEGDSKAGWVLLDIJ^SWvHIFSEDERQHYNLEKLWHEAPLLDAEVFMTE 118 
NG +EG +A WVL+DL WVH+F +DER +YNLEKLW +APL D + M + 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5613> which encodes the amino acid 
sequence <SEQ ID 5614>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.69 Transmembrane 91 - 107 ( 91 - 107) 

Final Results 

bacterial membrane Certainty=0 . 1277 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the databases: 

>GP:CAB14504 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 55/113 (48%) , Positives = 80/113 (70%) 

Query: 17 MKKEELLKIVVEATEEKRAKDILALDLEGLTSLTDYFVIASATNSRQLEAIADNIREKVK 76 
45 M ++ +LKI A ++KRA+DILALD+EG++ + DYF+I + +Q++AIA I+++ 

Sbjct: 1 MNQKSILKIAAAACDDKRAEDILALDMEGISLVADYFLICHGNSDKQVQAIAREIKDQAD 60 

Query: 77 EAGGDASHVEGNSQAGWVLLDLTDVVTOLFLEDERYHYNLEKLWHEAPAVALD 129 
E G +EG +A WVL+DL DWVH+F +DER +YNLEKLW +AP LD 

50 Sbjct: 61 ENGIQVKKMEGFDEARWVLVDLGDWVHVFHKDERSYYNLEKLWGDAPLADLD 113 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 78/116 (67%) , Positives = 100/116 (85%) 
55 Query: 1 MTEKDLLQLWKAADEKRAEDIVILDLQPVTSVADYFVIMSASNSRQLEAIADNIREQVK 60 
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Query: 61 GNGGDASHLEGDSKAGWVLLDLNSWVHIFSEDERQHYNLEKLWHEAPLIiDAEVFM 116 

GGDASH+EG+S+AGWVLLDL WVH+F EDER HYNLEKLWHEAP + + ++ 
Sbjct: 77 EAGGDASHVEGNSQAGWVLLDLTDVWHLFLEDERYHYNLEKLVIHEAPAVALDAYL 132 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1805 

A DNA sequence (GBSxl912) was identified in S.agalactiae <SEQ ID 5615> which encodes the amino 
acid sequence <SEQ ID 5616>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2415 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1806 

A DNA sequence (GBSxl913) was identified in S.agalactiae <SEQ ID 5617> which encodes the amino 
acid sequence <SEQ ID 561 8>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1570 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14503 GB:Z99117 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 66/242 (35%) , Positives = 154/242 (63%) , Gaps = 4/242 (1%) 

Query: 4 YETFAAvYDAVMDDTLYAKWTDFSLRHFPKGKKKLLELACGTGIQSVRFAQAGYAVTGLD 63 

Y+ FA+VYD +M Y +WT + P+ K ++L+LACGTG S+R A+ G+ VTG+D 

Sbjct: 3 YCGFASVYDELMSHAPYDQWTKWIEASLPE-KGRlLDLACGTGEISIRIiAEKGFEVTGID 61 

Query: 64 LSGDMLKLAKKRATSAHQSIQFIEGNMLDLSNV-GKYDLITCYSDSICYMQDEVEVGDVF 122 

LS +ML A+-J-+ +S+ Q I F-H- +M +++ G-H-D + DS+ Y++ + +V + F 
Sbjct: 62 LSEEMLSFAQQKVSSS-QPILFLQQDMRE1TGFDGQFDAWICCDSLNYLKTKNDVIETF 120 

Query: 123 IEVYKALEENGVFIFDVHSTYQTDKVFPGYSYHENADDFAMVWDTYEDDAPHSIVHELTF 182 

V++ L+ G+ +FDVHS+++ +VFE ++ + +D + +W ++ S++H+++F 
Sbjct: 121 KSVFRVLKPEGILLFDVHSSFKIAEVFPDSTFADQDEDISYIWQSFAGSDELSVIHDMSF 180 

Query: 183 FVQEEDGRFTRHDEVHEERTYDILTYDILLECAGFKDVKVYADFEDKKPTATSARWFFVA 242 

FV + + R DE HE+RT+ + Y+ +L+ GF+ +V ADF D +P+A S R FF A 
Sbjct: 181 FVWNGEA-YDRFDETHEQRTFP\?EEYEEMLKNCGFQLHRVTADFTDTEPSAQSERLFFKA 239 
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Query: 243 HK 244 
K 

Sbjct: 240 QK 241 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5619> which encodes the amino acid 
sequence <SEQ ID 5620>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2315 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 191/243 (78%) , Positives = 215/243 (87%) , Gaps = 2/243 (0%) 

YETFAAVYDAVMDDTLYAKWTDFSLRHFPK- -GKKKLLELACGTGIQSVRFAQAGYAVTG 61 
YE FA+VYDAVMDD+LY WTDFSLRH PK G+ +LLELACGTG I QSVRFAQAG+ VTG 
VEKFASWDAvMDDSLYDLWTDFSLRHDPKSKGRNRLLELACGTGIQSVRFAQAGFDVTG 8 0 

LDLSGDMLKLAiaCRATSAHQSIQFIEGNMLDLSNVGKYDLITCYSDSICYMQDEVEVGDV 121 
LDLS DML + AKKRA SA + I FI+GNMLDLS VG++D +TCYSDS I CYMQDEV+VGDV 



FF+QE+DGRF+R DEVHEERTY++LTYDILLEQAGFK KVYADFEDK+PT TS RWFFV 



261 AYK 263 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1807 

A DNA sequence (GBSxl914) was identified in S.agalactiae <SEQ ID 5621> which encodes the amino 
acid sequence <SEQ ID 5622>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Query: 




Sbjct: 






62 


Sbjct: 


81 


Query: 


122 


Sbjct: 






182 


Sbjct: 


201 




242 


Sbjct: 


261 



Final Results 

bacterial cytoplasm Certainty=0. 3538 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06304 GB:AP001516 unknown conserved protein [Bacillus halodurans] 
Identities = 129/367 (35%) , Positives = 184/367 (49%) , Gaps = 45/367 (12%) 



Query: 1 MTVTGIVAEFNPFHNGHKYLLEQAQ GIKVIAMSGNFMQRGEPAIVDKWTRSQMAL 55 

M G+V E+NPFHNGH + L 4-A+ + + MSG F+QRGEPAI+ KW R+ +AL 

Sbjct: 1 MKAVGWVEYNPFHNGHLHHLTEARKQAKADWIAVMSGYFLQRGEPAILPKWERTSLAL 60 
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56 


Sbjct: 


61 




114 


Sbjct: 






169 


Sbjct: 


181 




214 


Sbjct: 


241 


Query: 


272 


Sbjct: 


297 




320 


Sbjct: 


357 



W+ F LLKY+++T 4 



TKRYT RI+R T++ N 



A related DNA sequence was identified in S. pyogenes <SEQ ID 5623> which encodes the amino acid 
sequence <SEQ ID 5624>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3165 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 221/359 (61%) , Positives = 288/359 (79%) 

Query: 1 MTVTGIVAEFNPFHNGHKVLLEQAQGIIWIAMSGNFMQRGEPAIVDKWTRSQMALENGAD 60 

MTVTGI +AEFNPFHNGHKYLLE A+G+K+IAMSGNFMQRGEPA++DKW RS+MAL+NGAD 
Sbjct: 1 MTVTGIIAEFNPFHNGHKYLLETAEGLKIIAMSGNFMQRGEPALIDKWIRSEMALKNGAD 60 

Query: 61 LVIELPFLVSVQSADYFASGAVSILARLGVDNLCFGTEEMLDYARIGDIYVNKKEEMEAF 120 

+V+ELPF VSVQSADYFA GA+ IL +LG+ L FGTE ++DY ++ +Y K E+M A+ 
Sbjct: 61 IWELPFFVSVQSADYFAQGAIDILCQLGIQQIAFGTENVIDYQJCLIKVYEKKSEQMTAY 120 

Query: 121 LKKQSDSLSYPQKMQAMWQEFAGITFSGQTPNHILGLAYTKAASQNGIRLNPIQRQGAGY 180 

L D+ SYPQK Q MW+ FAG+ FSGQTPNHI LGL+ Y KA++ I+L PI+RQGA Y 
Sbjct: 121 LSTLEDTFSYPQKTQKMWEIFAGVKFSGQTPNHILGLSYAKASAGKHIQLCPIKRQGAAY 180 

Query: 181 HSSEKTEIFASATSLRKHQSDRFFVEKGMPNSDLFLNSPQWWQDYFSLLKYQIMTHSDL 240 

HS +K + ASA+++R+H +D F+ +PN+ L +N+P + W YFS LKYQI+ HSDL 
Sbjct: 181 HSKDKNHLIASASAIRQHLNDMJFISHSVPNAGLLINNPHMSVTOHYFSFLKYQIljNHSDL 240 

Query: 241 TQIYQVNEEIANRIKSQIRYVETVDELVDKVATKRYTKARIRRLLTYILINAVESPIPNA 300 

T I+QVN+E+A+RIK 1+ + +D LVD VATKRYTKAR+RR+LTYIL+NA E +P 
Sbjct: 241 TSIFQVNDEIASRIKKAIKVSQNIDHLVDTVATKRYTKARVRRILTYILVNAKEPTIjPKG 300 

Query: 301 IHVLGFTQKGQX3HLKSWKSVDIVTRIGSQTWDSLTQRADSVYQMGNANIAEQTWGRIP 359 

IH+LGFT KGQ HLK +KKS ++TRIG++TWD +TQ+ADS+YQ+G+ +1 EQ++GRIP 
Sbjct: 301 IHILGFTSKGQAHLKKLKKSRPLITRIGAETl-JDEMTQKADSIYQLGHQDIPEQSFGRIP 359 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1808 

A DNA sequence (GBSxl915) was identified in S.agalactiae <SEQ ID 5625> which encodes the amino 
acid sequence <SEQ ID 5626>. This protein is predicted to be transcriptional activator tipa. Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3117 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



VKEISHISGISVRTLHYYDEIDLLSPSFVGENGYRYYDDESLIKLQEILLFKELEFPLKK 63 
VK+++ ISG+S+RTLH+YD I+LL+PS + + GYR YD L +LQ+IL FKE+ F L + 
VKQVAEISGVSIRTLHHYDNIELLNPSALTDAGYRLYSDADLERLQQILFFKEIGFRLDE 64 



IKE++D PN+DR AL Q L KKQR++E+I+ 



+++S +D+ I 



Identities 






Sbjct: 


5 


Query: 


64 


Sbjct: 


65 




117 


Sbjct: 


125 




171 


Sbjct: 


185 




231 


Sbjct: 


244 



P D ++Q V +D+I Y+CT D+ LG +YI DERF SI+ + G+G A F+ 4 



There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1809 

A DNA sequence (GBSxl916) was identified in S.agalactiae <SEQ ID 5627> which encodes the amino 
acid sequence <SEQ ID 5628>. Analysis of this protein sequence reveals the following: 

j N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2590 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty»=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14597 GB.-Z99117 yrkC [Bacillus subtilis] 
Identities = 56/129 (43%), Positives = 74/129 (56%), Gaps = 7/129 (5%) 



55 Query: 2 KGFHGNIEKLTLGNTNFRQVLYTAEHCQL\rDMTLPVGGEIGSEIHAENDQFFRFEAGHGK 61 

K F NI + T N FR L+T +H Q+ LM+L +G +IG EIH DQF R E G G 
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Sbjct: 59 KPFVVNII^TKQNm'FRTM,WTGKHFQVTJlMSLGIGEDIGLEIHPNVDQFLRIEQGRGI 118 

Query: 62 WIDGN EYEVADGDAIIVPAGAEHNVINTSETEMLKLyTIYSPAHHKDGIIRaT 115 

Sbjct 

Query: 116 I 

+ +A E+ 
Sbjct: 178 KADAVAAED 186 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1810 

A DNA sequence (GBSxl917) was identified in S.agalactiae <SEQ ID 5629> which encodes the amino 
acid sequence <SEQ ID 5630>. This protein is predicted to be glycerol uptake facilitator (glpF). Analysis of 
this protein sequence reveals the following: 
Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -9.08 Transmembrane 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
INTEGRAL Likelihood = 



Transmembrane 135 - 151 ( 132 - 155 
6 - 102 ( 80 - 103 



76 

Pinal Results 

bacterial membrane Certainty=0 .4630 (Affirmative) < succ 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MTQFLGEFLGTFILVLLGDGWAGNVLSKTKEEGTGWTAIVFGWGIACTVAVYVSGLFSP 60, 

M+ FLGE +GT IL++LG GWAG VL TK E GW I WG+A AVY G S 
Sbjct: 1 MSPFLGEVIGTMILIILGGGWAGWLKGTKSENGGWIVITAAWGLAVATAVYCVGQISG 60 

Query: 61 AHLNPAVTLAMASIGAISWGQVI PFIIAQMLGAMVAATILWLHYYPHWKETKDSGLILAS 120 

AHLNPAVT+ +A +GA W QV +I+AQMLGAM+ AT+++LHYYPH+K T+D G LA 
Sbjct: 61 AHLNPAOTIGLALVGAFEWSQVAGYIVAQMLGAMIGATLW 120 

Query: 121 FSTGPAIRHTPSNLLGEIIGTAILVITIMAIGPSKVARGLGPIIVGIVIFAVGFSLDPTT 180 

FST PAI+H P+N E++GT +LV+ I+AIG ++ GL P+IVG++I +G SL TT 
Sbjct: 121 FSTDPAIKHLPANFFSEVLGTFVLVLGILAIGANEFTEGLNPLIVGLLIWIGLSLGGTT 180 

GYAINPARDLGPR+ H +LPI KG+S+WSYAWIP+VGPIIGG +GA+ Y 
Sbjct: 181 GYAINPARDLGPRIAHFLLPIPGKGSSNWSYAWIPIVGPIIGGGIGALTY 230 

There is also homology to SEQ ID 2854. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1811 

A DNA sequence (GBSxl918) was identified in S.agalactiae <SEQ ID 5631> which encodes the amino 
acid sequence <SEQ ID 5632>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1S94 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07114 GB:AP001518 unknown conserved protein in others 
[Bacillus halodurans] 
15 Identities = 64/118 (54%) , Positives = 85/118 (71%) 



Query: 5 C3IIWSHSKNIAQGWDLISEVAKDVSITYVGGTEDGEIGTSFDQVQQIVEQNDKKTLLA 64 

GI++ SH +A+G+V L+ E AKDVSITY GGT+D ++G SF4++QQ V N+ h 
Sbjct: 7 GIVISSHVPALAEGIVTLLKEAAKDVSITYAGGTDDDQVGASFEKIQQAVMDNEADELFV 66 

20 

Query: S5 FFDLGSAKMNLELVADFSEKNI I IMS VP WEGAYTAAALLQAGADLDS IQSQLAELTI 122 

F+DLGSAKMN+E+V + SEK I + V +VEGAYTAAAL Q GA ++I QL LTI 
Sbjct: 67 FYDLGSAKMNVE^IVMELSEKTIHLMDVALVEGAYTAAALTQGGASFETIMEQLQPLTI 124 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1812 

A DNA sequence (GBSxl919) was identified in S.agalactiae <SEQ ID 5633> which encodes the amino 
acid sequence <SEQ ID 5634>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0 .4753 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:BAB07115 GB:AP001518 unknown conserved protein in others 

[Bacillus halodurans] 
Identities = 98/190 (51%), Positives = 135/190 (70%), Gaps = 2/190 (1%) 



Sbjct: 4 VEOTTKWLHAFHEKVOANQSYLSELDSAIGDGDHGITO^GI^VERKLKENLFESPQEV 63 

Query: S3 FKTVSMQLLSKVGGASGPLYGSAFMGITK-AEQSKSTISEALGAGLEMIQKRGKAELNEK 121 

K +M L+SK GGASGPLYG+A + ++K I +++ AGL I KRGKA EK 

Sbjct: 64 LKMAAmLISKTGGASGPLYGTALLEMSKQVANDPQNIGKSIEAGLNGILKRGKATTGEK 123 

Query: 122 TMVDVWHGVIEAI-EKNELTEDRIDSLVDATKGMKATKGRaSYVGERSVGHIDPGSFSSG 180 

TMVD+W V+E++ + +L+++RI V TK MKATKGRASY+GERS+GH+DPG+ SSG 
Sbjct: 124 T^IWKPVVESL^QQLSKERICjQFVSETKEMKATKGRASYLGERSLGHLDPGAVSSG 183 

Query: 181 LLFKALLEVG 190 
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Sbjct: 184 YLFEAMIDGG 193 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1813 

A DNA sequence (GBSxl920) was identified in S.agalactiae <SEQ ID 5637> which encodes the amino 
acid sequence <SEQ ID 5638>. This protein is predicted to be dihydroxyacetone kinase (M200). Analysis 
of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2080 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07116 GB:AP001518 dihydroxyacetone kinase [Bacillus halodurans] 
Identities = 204/329 (62%) , Positives = 261/329 (79%) 

Query: 1 MKKII^QPTDWTEMLDGLAYVHNDLVHRIEGFDIIARNEEKSGKVALISGGGSGHEPSH 60 

MKKILN P +V+ EMLDG Y + LV R+ G +1 R E GKVAL+ SGGGSGHEPSH 
Sbjct: 1 MKKILNDPQNVLDEMLDGFVYANGHLVERVAGTGVIRRTYEDKGKVALVSGGGSGHEPSH 60 

Query: 61 AGWGEGMLSAAVCGAVFTSPTPDQVLEAIKEADEGAGVFMVIKNYSGDIMNFEMAQDMA 120 

AGFVG+GMLSAAVCG VFTSPTPDQ+ E IK AD+G GV ++IKNY+GD+MNFEMA +MA 
Sbjct: 61 AGFVGQGMLSAAVCGEVFTSPTPDQIFEGIKAADCjGGGVLLIIKNYTGDVMNFEMAGEMA 120 

Query: 121 EMEGIEVASVVVDDDIAVEDSriYTQGKRGVAGTILVHKILGHAARHGKSLQEIKAIADEL 180 

E EGI V ++V+DDIAVEDS +T G+RGVAGTI +VHKI +G AA G SLQ +K + + + 
Sbjct: 121 FJffiGITVDHIIVNDDIAVEDSSFTAGRRGVAGTIIVHKIVGAAAEAGLSLQSLKVLGETV 180 

+ N T+G+++ ATVP VGKPGF h +DE+E+G+GIHGEPGYRKEK+4- SK +A EL+ 
Sbjct: 181 1ENTKTIGVSILPATVPAVGKPGFELGDDEMEYGVGIHGEPGYRKEKLKSSKEIAEELIL 240 

Query: 241 KLIESFDAKSGEKYGVLINGMGATPLMEQYVFANDVAKLLEDKGIEVNYKKLGNYMTSID 300 

KL E+F G+KYGVL+NG+GATPLMEQYVF NDVA L ++G+ + +KK+G++MTSID 
Sbjct: 241 KLKEAFGWSKGDKYGVLWGLGATPMEQYVFMNDVANKLTEEGLNIQFKKVGSFMTSID 300 

Query: 301 MAGLSLTLIKLENQEWLEALNSDVTTIAW 329 

MAG+SLTLIK+ ++WL+ N +V T+ W 
Sbjct: 301 MAGVSLTLIKIVEEKMjDYWNHEVKTVDW 329 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be, useful antigens for 
vaccines or diagnostics. 

Example 1814 

A DNA sequence (GBSxl921) was identified in S.agalactiae <SEQ ID 5639> which encodes the amino 
acid sequence <SEQ ID 5640>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1997 (Affirmative) < suco 
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bacterial membrane Certainty=0 .0000 (Not Clear) < succ> 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07113 GB:AP001518 unknown [Bacillus halodurans] 
Identities = 59/142 (41%), Positives = 82/142 (57%), Gaps = 5/142 (3%) 

Query: 1 MTSSLlTKKKIAKSFKRLFISQAFDKISVSDIMEDAGIRRQTFYNHFVDKyALIjEWIFQT 60 

MT+S+ITKK IAK+FK L Q F KISVSDIM A +RRQTFY HF DK+ Lh WI++ 
Sbjct: 1 MTNSIITKKVIAKAFICDLMEVQPFSKISVSDIMNRANMRRQTFyYHFQDKFELLHWIYKQ 60 

Query: 61 ELSEQVTDNLDYISGFQLLSELLTFFKMNQEFYIKLFQIEDQNDFSSYFESYCEQLVDKL 120 

EE D h Y + L+ +F NQ FY + + QN F+ Y + + L 
Sbjct: 61 ETKEHS IDFLAYDD IHT I FRHLMHYFYENQTFYQRAIWVNGQNGFTDYLYEHI QTL Y 117 

Query: 121 LSDYSKSNFNQKERVTFINYHS 142 

L++ + +QK+R +++S 
Sbjct: 118 LNEIDRR- -SQKDREFISSFYS 137 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5641> which encodes the amino acid 
sequence <SEQ ID 5642>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results' 

bacterial cytoplasm --- Certainty=0 .2101 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 31/115 (26%) , Positives = 58/115 (49%) , Gaps = 6/115 (5%) 

Query: 7 TKKKIAKSFKRLFISQAFDKISVSDIMEDAGIRRQTFYNHFVDKYALLEWIFQTEIiSEQV 66 

TK + + L Q+F+ ++VSD+ + AGI R TFY H+ DK+ ++ F+ + + + 
Sbjct: 8 TKAWKTALTTLLTEQSFETLWSDLTKKAGimGTFYLHYTDKFDMMrai-FKNDTIiDDL 66 

Query: 67 TDNLD YISGFQDLSELLTFFKMNQEFYIKLFQISDQNDFSSYFESYCEQIjV 117 

L+ Y Q+L4-+ L++ ++EF LI F + +C Q + 

Sbjct: 67 YRLLNQAEIYTDTRQVLNQTLSYLIEHREFITAIATI-SYLKFPQLIKDFCYQFL 120 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1815 

A DNA sequence (GBSxl922) was identified in S.agalactiae <SEQ ID 5643> which encodes the amino 
acid sequence <SEQ ID 5644>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1974 (Affirmative) < suco 
bacterial membrane — - Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — - Certainty=0 . 0000 (Not Clear) ■; suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1816 

A DNA sequence (GBSxl923) was identified in S.agalactiae <SEQ ID 5645> which encodes the amino 
5 acid sequence <SEQ ID 5646>. This protein is predicted to be dihydroxyacetone kinase (M200). Analysis 
of this protein sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. IB 06 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07112 GB:AP001518 dihydroxyacetone kinase [Bacillus halodurans] 
Identities = 141/285 (49%) , Positives = 197/285 (68%) , Gaps = 1/285 (0%) 



Query: 


45 


IPILSGGGSGHEPAHFGYVGEGMLSAAISGPIFVPPCASDILETIRFINRGKGVFVIIKN 104 






+PI+SGGGSGHEP H GYVGEGML+AA+ G +FVPP A +L IR +++GKGV +IIKN 


Sbjct: 


46 


VPI ISGGGSGHEPGHLGWGEGMLAAAVHGDWVPPSAQQVLAAIRQMDQGKGVLLIIKN 105 


Query: 


105 


FEADLEEFSQAIEQARQEGIPIKYIVSHDDISVET-SNFKIRHRGVAGTVLLHKIIGQAA 163 






F SDL F A OAR EG + +++ +DD+SVE+ ++F+ R RGVAG VL+HKIIG AA 


Sbjct: 


106 


FvADIATFLSAEVQARAEGRDVAOTIVNDDVSVESDASFEKRRRGVAGAVLVHKIlGAAA 165 


Query: 




LEGASLDELEQLGLSLTTSMATLGVASKSATILGQHQPVFDIEEGYISFGIGIHGEPGYR 223 






EG SL+ Ii+++G + ++ATLGVA A + + +P F +EEG + FG+GIHGE GYR 


Sbjct: 


166 


KEGYSLEALQEIGEQVVKNLATLGVALTHADLPERREPQFLLEEGEVyFGVGIHGEQGYR 225 


Query: 


224 


TMPFVSMEHIANELTOKLKMKLRWQDGEAFILLINNLGGSSKMEELLFTNAVMEFLALDD 283 






VS E LA ELVNKLK RW + + +LIN LGG+ +E+ +F N V LA+++ 


Sbjct: 


226 


KEIOiVSSEBIiAVELTOKIiKSLYRWDKNDQYAVLINGLGGTPLIEQYVFANDVRRLLAIEN 285 


Query: 


284 


LQLPFIKTGHLITSLDMAGLSVTLCRVKDSRW1DYLKHKTDARAW 328 






L + F+K G +TSL+M G+S+T+ ++ D +W+ +L D W 


Sbjct: 




LHVSFVKVGTQLTSLNMKG I SLTMLKICEEQWVKWLYAPVDVAHW 330 



No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1817 

A DNA sequence (GBSxl924) was identified in S.agalactiae <SEQ ID 5647> which encodes the amino 
acid sequence <SEQ ID 5648>. Analysis of this protein sequence reveals the following: 

45 Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3902 (Affirmative) < suco 

50, bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10085> which encodes amino acid sequence <SEQ ID 
10086> was also identified. 
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Query: 


20 


Sbjct: 


1 




80 


Sbjct: 


61 




140 


Sb j Ct : 


121 




200 


Sbj ct: 


181 



-2050- 

The protein has homology with the following sequences in the GENPEPT database. 



DETFV+GRYEGFGPNGSMI I +TLTSNVNRT ANVRT + K GGN 
KHVIDKAIDKAKGGGDETFVQGRYEGFGPNGSMI IAETLTSNVKRTIAHVRTIFNKKGGN 120 

MGASGSVSYLFDKKGVIVFAGDDADTVFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 199 
+GA+GSVSY+FD GVIVF G D D +FE LLEA+VDV DV EEG I +YT PTDLHKG 
IGAAGS VSYMFDNTGVTVFKGTDPDHI FEIIJjEAEVDVRDVTEEEGNI VI YTEPTDLHKG 180 

IQALRDNGVEEFQVTELEMIPQSEWLEGDDLETFEIGj I DALESDDDVQKVYHNVAD 256 
I AL+ G+ EF TELEMI QSEV L +DLE FE L+DALE DDDVQKVYHNVA+ 
IAALKAAGITEFSTTELEMIAQSEVELSP3DLE I FEGLVDALEDDDDVQKVYHNVAN 237 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5649> which encodes the amino acid 
sequence <SEQ ID 5650>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty= 0.2 92 6 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not ' Clear) < suco 

An alignment of the GAS and GBS proteins is shown below, 
identities = 233/238 (97%), Positives = 236/238 (98%) 

Query: 2 0 MGRKWANIVAKKTAKDGANSKVYAKFGVEIWAAKQGEPDPESNSALKFVLDRAKCAQVP 79 

MGRKWANIVAKKTAKDGA SKVYAKFGVEIYVAAKQGEPDPE N+ALKFV+DRAKQAQVP 
Sbjct: 1 MGRKWANIVAKKTAKDGATSKVYAKFGTOIYVAAKQGEPDPELNTALKFVIDRAKQAQVP 60 

Query: 80 KHVIDKA1DKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 139 

KHVIDKAIDKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 
Sbjct: 61 K3WIDKAIDKAKGNTDETFVEGRYEGFGPNGSMIIVDTLTSNVNRTAANVRTAYGKNGGN 12 0 

Query: 140 MGASGSVSYLFDKKGVIVFAGDDADTVFEQLLEADTOVDDVEAEEGTITVYTAPTDLHKG 199 

MGASGSVSYLFDKKGVIVFAGDDAD+VFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 
Sbjct: 121 MGASGSVSYLFDKKGVIVFAGDDADSVFEQLLEADVDVDDVEAEEGTITVYTAPTDLHKG 180 

Query: 200 IQALRDNGVEEFQVTELEMIPQSEV^/LEGDDLETFEKLIDALESDDDVQKVYHNVADF 257 

IQALRDNGVEEFQVTELEMIPQSEVVLEGDDLETFEKLIDALESDDDVQKVYHNVADF 
Sbjct: 181 IQALRDNGVEEFQVTELEMIPQSEVi/LEGDDLETFEKLIDALESDDDVQKVYHNVADF 238 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1818 

A DNA sequence (GBSxl925) was identified in S.agalactiae <SEQ ID 5651> which encodes the amino 
acid sequence <SEQ ID 5652>. Analysis of this protein sequence reveals the following: 



3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2507 (Affirmative) • 
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bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1819 

A DNA sequence (GBSxl926) was identified in S.agalactiae <SEQ ID 5653> which encodes the amino 
10 acid sequence <SEQ ID 5654>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 1523 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty* 0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CAA20826 GB:AL031541 hypothetical protein SCI35.37 [Streptomyces 

coelicolor A3 (2) ] 

Identities = 73/178 (41%), Positives = 101/178 (56%), Gaps = 2/178 (1%) 

Query: 35 VKNAGGLPVILPISEAESAKAYVEMIDKLIISGGQNVLPSYYGEEKIIESDDYSLARDIF 94 
25 V+ AGGL +LP E A A V +D ++I+GG +V P YG E + + ARD + 

SbjCt: 37 VQRAGGIAA^1LPPI3APEHAAATVARVIX3WIAGGPDVEPWYGAEPDPRTGPPARARDTW 96 

Query: 95 EFALVEEALKQNKPIFAICRGMQLVNVALGGTIjNQSIDNHYQEPYIGFAHYLNVEKGSFL 154 
E AL+E AL P+ ICRGMQL+NVALGGTL Q 1+ H + + H + G+ 
30 Sbjct: 97 ELALIEAAIAARVPLLGICRGMQIJLNVALGGTLVQHIERHAEVVGVFGGHPVRPVPGTLY 156 

Query: 155 EGFISGDFKINSIjHRQSVKLLAEGLIVSARDPRDGTVEAYESRT-EQCIIGVQWHPEL 211 

G + + + + H Q+V L GL+ SA DGTVEA E + ++GVQWHPE+ 
Sbjct: 157 AGAVPEETFVPTYHHQAVDRLGSGLVASAH-AADGTVEALEMPSGSGWVLGVQWHPEM 213 

35 

A related DNA sequence was identified in S.pyogenes <SEQ ID 565 5> which encodes the amino acid 
sequence <SEQ ID 5656>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

40 

Final Results 

bacterial cytoplasm Certainty=0. 1210 (Affirmative) suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/230 (48%), Positives = 145/230 (62%), Gaps = 3/230 (1%) 

Query: 2 LTKPIIGITGNEREMSDIPGYYYDSVSRHISEGVfCNAGGLPVILPISEAESAKAYVEMID 61 
50 +TKPIIGIT N+R + + + V +GGLP++LPI + +AK YV M+D 

Sbjct: 1 MTKPIIGITANQRIJSMALDNLPWSYAPTGFVQAVTQSGC-LPLLLPIGDEAAAKTYVSMVD 60 

Query: 62 KLIISGGQNvTjPSYYGEEKIIESDDYSLARDIFEFALVEEALKQNKPIFAICRGMQLVNV 121 
K+I+ GGQNV P YY EEK DD+S RD FE A+++EA+ KPI ICRG QL+NV 
55 Sbjct: 61 KI I LIGGQNVDPKYYQEEKA&FDDDFS PERDTFELAI IKEAITLKKP I LGI CRGTQLMNV 120 
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Query: 122 ALGGTMQSIDOTYQE-PYItBFflHYIiNVEKGSFLEGFISGDFKINSLHRQSVKLLAEGLI 180 

ALGG LNQ ID+H+QE P +H + +E S L INS HRQS+K +A+ L 

Sbjct: 121 ALGGNLNQHIDSHWQEAPSDFLSHEMIIEPDSILYPIYGHKTLINSFHRQSLKTVAKDLK 180 

5 Query: 181 VSARDPRDGTVEAYESRTEQC - 1 IGVQWHPELMLH- QIENQTLFGYFVNE 228 

V ARDPRDGT+EA S + +GVQWHPEL+ + E+ LF FVN+ 
Sbjct: 181 VIARDPRDGTIEAVISTNDAIPFLGVQWHPELLQGVRDEDLQLFRLFVND 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1820 

A DNA sequence (GBSxl927) was identified in S.agalactiae <SEQ ID 5657> which encodes the amino 
acid sequence <SEQ ID 5658>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5794 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1821 

A DNA sequence (GBSxl928) was identified in S.agalactiae <SEQ ID 5659> which encodes the amino 

acid sequence <SEQ ID 5660>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
30 »> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0524 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8905> which encodes amino acid sequence <SEQ ID 8906> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: 22 Crend: 4 
40 McG: Discrim Score: 8.37 

GvH : Signal Score (-7.5): -0.64 

Possible site: 21 
»> May be a lipoprotein 

ALOM program count: 0 value: 6.74 threshold: 0.0 
45 PERIPHERAL Likelihood = 6.74 112 

modified ALOM score: -1.85 

*** Reasoning Step: 3 

50 Final Results 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) 
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A related DNA sequence was identified in S.pyogems <SEQ ID 2919> which encodes the amino acid 
sequence <SEQ ID 2920>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

»> May be a lipoprotein 

Final Results 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suoo 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 120/162 (74%) , Positives = 141/162 (86%) , Gaps = 5/162 (3%) 

Query: 6 LAACSSKSHTTKTGK KEVNFATVGTTAPFSYVKDGKLTGFDIEVAKAVFKGSDNYK 61 

IAAC S S T ++G KEV FATVGTTAPFSY K G+LTG+DIEVAKAVFKGSD+YK 

Sbjct: 20 LAACGS - SKTAESGNQGSSKEVLFATVGTTAPFSYEKGGQLTGYDIEVAKAVFKGSDDYK 78 

Query: 62 VTFKKTEWSSVFTGIDSGKFQMGGNNISYSS3RSQKYLFSYPIGSTPSVLAVPKNSNIKA 121 

V+ FKKTEWSS + FTG+DSGK+OMGGNNI S + 4- ERS KYLFSYPIGSTPSVL VPK+S+IK+ 
Sbjct: 79 VSFKKTEWSSIFTGLDSGKYQMGGMNISFTKERSAKYLFSYPIGSTPSVLWPKDSDIKS 138 

Query: 122 YNDISGHKTQWQGTTTAKQLENFNKEHQKNPVTLKYTNENL 163 

++DI GH TQWQGTT+ QLE+FNK+H NPVTLK+TNEN+ 
Sbjct: 139 FDDIQGHTTQWQGTTSVAQLEDFNKKHSDNPVTLKFTNENI ISO 

SEQ ID 8906 (GBS71) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 4; MW 31.8kDa). 

GBS71-His was purified as shown in Figure 196, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1822 

A DNA sequence (GBSxl929) was identified in S.agaJactiae <SEQ ID 5661 > which encodes the amino 
acid sequence <SEQ ID 5662>. Analysis of this protein sequence reveals the following: 
Possible site: 18 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2179 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
There is also homology to SEQ ID 2920: 

Identities = 64/91 (70%) , Positives = 78/91 (85%) 

Query: 1 MSDGKADFKLFDGPTVNAIIKNQGLTNLKTIPLTMPJDQPYIYFIFGQDQKDLQKYVNNRL 60 

+S+GKADFK+FD PTVNAIIKNQGL NLKTI LT +QP+IYFIF QDQ+ LQ +VN R+ 
Sbjct: 187 LSEGKADFKIFDAPTVNAIIKKQGLDNLKTIEbTSTEQPFIYFIFSQDQEKLQSFVNKRI 246 

Query: 61 KQLRKDGTLSKIAKEYLGGDYVPNEKDLVTP 91 

K+B DGTLSK+AKE+LGGDYVP++K+L P 
Sbjct: 247 KELTADGTLSKLAKEHLGGDYVPSDKELKLP 277 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1823 

A DNA sequence (GBSxl930) was identified in S.agalactiae <SEQ ID 5663> which encodes the amino 
acid sequence <SEQ ID 5664>. This protein is predicted to be 28 kDa outer membrane protein (yaeC). 
Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.44 Transmembrane 25 - 41 ( 25 - 42) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59825 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 110/283 (38%) , Positives = 175/283 (60%) , Gaps = 13/283 (4%) 



K+KP F+DY QPN+A + D+DINAFQ YNl-l -l- WNK +K -i 





22 


Sbjct: 






78 


Sbjct: 


64 




13 S 


Sbjct: 


124 




195 


Sbjct: 




Query: 


255 


Sbjct: 


244 



- IYS+++ L LK+G+T+AI PNDA+N SRAL+VLQSAGL+KL S 



RKNWKKQKNAKAIQAILDAYHTDEVKKVIKDTSAD- - -IPQW 2 94 

K+KN K + + AY + +K IK+ D +P W 
- -TTSKEKNNKVYKEVAKAYASKATEKAI KEQYPDGGELPAW 283 

There is also homology to SEQ ID 2132. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8907> and protein <SEQ ID 8908> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 7.47 
GvH: Signal Score (-7.5): -4.79 

Possible site: 21 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -1.44 threshold: 0.0 
INTEGRAL Likelihood = -1.44 
PERIPHERAL Likelihood = 5.20 
modified ALOM score: 0.79 

*** Reasoning Step: 3 

Final Results 
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bacterial membrane Certainty=0. 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

40.6/53.1% over 279aa 

Lactococcus lactis 

GP| 6165402 | hypothetical protein Insert characterized 

10 ORF00442(364 - 1182 of 1482) 

GP|6165402|emb|CAB59825.l| |AJ012388(4 - 283 of 287) hypothetical protein {Lactococcus 

lactis} 

%Match =21.0 

%Identity =40.6 %Similarity =63.0 
15 , Matches = 112 Mismatches = 96 Conservative Sub.s = 62 

162 ' 192 222 252 282 312 342 372 

WDTFKNS*RlPWR*LRTK*ERSRYS*SEWIKTKEMSILSFLDySLKL*QETVYNNLILlTSYGIISLSQKLREFIMKLK 
■ • I : 

20 MNPKNR 

402 450 480 510 540 564 594 

HIVLGLALTTLLG- -VTFS- -NQEVSASSTSSKWKVGVMTFSDTEKARWDKIEKLVGDK- -AKIKFTEFTDYTQPNQAT 
= l = = :|: 1 = II = I =1 I 1111=1= == I = =1 1=11 1=11 111=1 

25 NIIIAVAVLILVALVAFFSLNHQGGWASAGEKTVKVGIMSGDKQ 

20 30 40 50 SO 70 80 

,624 654 684 714 744 774 804 834 

ANKDVmNAFQHYNFLENWNKENKKNLIPLEKTY 
30 , ■': hllMII ||::: ||| :| ::: : ||: |: |||::: | I I = I = I = I I I I I I = I 1111 = 1111111 = 1 

LSGDIDINAFQSYNWKTWNKftHKSDIVAVGNTYITPMHIYSKEISKLSDLKEGSTVAIPNDASNESRALFVLQSAGLLK 
100 110 120 130 140 150 160 

861 891 921 951 981 1011 1041 1071 

35 LNVS-GKKVATVANITSNKKDINIQELDASQTPRALKDVDAAIINNTYIEQANLKPSDAIFVEKSDKNS 

II 1= = =11 I ■ =1 = 111111111 I = = = l I 1 = 1 l== = l=.| =1 I 1 = 11 II 

LTTSDSSKLVGLPDITENPHQLKFKEVDASQTPRALDSVALSVVNYNYATAASLPKSESVFMEPLNKTSAQYINFIA 

180 190 200 210 220 230 240 

40 1101 1131 1161 1182 1212 1242 1272 1302 

NWKKQKNAKAIQAILDAYHTDEVKKVIKDTSAD IPQW*RELTV*V*QGILIGYNLSAI*P*RAWDEYNVPGSWIVFE 

• 1=11 I = = II = =1 11= I =11 =1 
TTSKEKNNKVYKEVAKAYASKATEKAI KEQYPDGGELPAWDLKL 
260 270 280 

45 SEQ ID 8908 (GBS35) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 1 (lane 2; MW 3 1 .6kDa). 

The GBS35-His fusion product was purified (Figure 96A; see also Figure 192, lane 6) and used to immunise 
mice (lane 2 product; 20u.g/mouse). The resulting antiserum was used for Western blot (Figure 96B), FACS 
(Figure 96C ), and in the in vivo passive protection assay (Table III). These tests confirm that the protein is 
50 immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Example 1824 

A DNA sequence (GBSxl931) was identified in S.agalactiae <SEQ ID 5665> which encodes the amino 
acid sequence <SEQ ID 5666>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
55 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3126 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 26 LRDLIAIKSI FAQKVGLNDLSSYLGEVF I KA3AEVI I DDSYSAPFIVANFKSSKVDAKRI 85 

LR L+A+ S+ AQ h 4 + + + G V AP ++A 4 
Sbjct: 16 LRALVALPSVSAQGRMLPETADAVAGLLRAEGFGVQQFPGTVAPVLLAEAGEGPFT L 72 

Query: 86 IFyNHYDTVPADEVEQWTEDPFTLSLRYGKMYGIRGVDDDKGHITARLSAVKKYLSRHKGE 145 

4 YNHYD P D +E W PF L4 R G44YGRG DDKG + 4RL4AV4 4 G 
Sbjct: 73 LIYNHYDVQPEDPLELWDTPPFELTERGGRLYGRGASDDKGELASRLAAVRA-VREQLGH 131 

Query: 146 LPLDITFIVEGAEESASVGLDYYLEKYQEQLQGADLIVWEDGPKNPKGQLEIAGGNKGIV 205 

LP4 I 444EG EE S L4 44 44 4LQ AD WE G 4P4G4 44 G KG44 
Sbjct: 132 LPVKIKWLIEGEEEVGSPTLERF^/AEHAAELQ-ADGCWWEFGGISPEGRPILSLGLKGVM 190 



Query: 206 TFDLSVSSADVDIHSSFGGWDSSTWYLIQAUfTLRDNKGHILVEGIYDKVIPPTKRELE 265 

4l> AD D4HSS G V4D4 4 L 4A4 4LRD 4G44 4 G YD V 4 4 4 
Sbjct: 191 CLELRCRVADSDLHSSLGAVIDNPLYCLARAVASLRDEQGNVTIPGFYDDVRAASGADRQ 250 

Query: 266 LVEKySYRSAKALEGAYQLVLPSLADSHKTFLRKLYFEPSIAIEGITSGYQGEGVKTILP 325 

44 4A4 44P 4 44 P44G GYQGEG KT4I1P 

Sbjct: 251 AIAQIP-GDGQATODTFGWRP--1ATGPAYNERTNLHPVVNUNGWGGGYQGEGSKTVLP 307 

Query: 326 AYAKCKAEVRLVPGLTPKGVLDSIQNHLKEHGFKDIELT-YTLGEMSYRSDMSAPSILKV 384 

K + RLVP P VL 44 HL G DIE4 4 R4D P 4 

Sbjct: 308 GftGFVKLDFRLVPDQDPMlVLSLLREHLTAQGLSDIEVVEa^EftHQKPARADAGHPWOAC 367 

Query: 385 VDLAEQFYPEGISLLPTSPGTGPMY LVHQALRAPIAAIGIGHANSRDHGVDENV 438 

V A 4 4 4 P4S 4GPM4 L . P A4GIG4 R H 4EN4 

Sbjct: 368 VAAARAAHGQDPIVHPSSGASGPMFPFTGGAGGGGLGIPCVAVGIGNHAGRVHAPNENI 426 

There is also homology to SEQ ID 2588. 

Based on this analysis, it was predicted that this protein and its epitopes, could be use 
vaccines or diagnostics. 

Example 1825 

A DNA sequence (GBSxl932) was identified in S.agalactiae <SEQ ID 5667> which encodes the amino 
acid sequence <SEQ ID 5668>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

10 N- terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0 . 5366 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59828 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 187/338 (55%) , Positives = 256/338 (75%) , Gaps = 12/338 (3%) 

Query: 6 IIKLDNIDVTFHQKKREINAVKDVTIHINCGDIYGIVGYSGAGKSTLVRVTNLLQEPSAG 65 
55 II4L4N4 V FHQK R 4 AVK+ T+HI 4GDIYG44GYSGAGKSTLVR INLLQ4P4 G 

Sbjct: 4 IIEIjNNLSVQFHQKGRLVTAVKNATLHIEKGDIYGVIGYSGAGKSTLVRTINLLQKPTEG 63 



Query: 66 KITIDDQVIYD- -NKVTLTSTQLREQRREIGMI FQHFNLMSQLTAEQNVAFALKHSG 120 

41 1+ 4 I+D N V T 4LRE R44IGMI FQHFNL4S4 T NVAFAL4HS 



WO 02/34771 



PCT/GB01/04789 















Query. 




Sbjct: 


184 




234 


Sbjct: 


244 




294 


Sbjct: 


304 
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QI VINGEKI FDSENPVKFTGAKLREFRQKIGMIFQHBmiLSEKTVFMNVAFALQHSQIED 123 

LSKiaKAAKVAKLLEIjVGLSDRAQNY?SQLSGGQKQRVAIARALAM)PKILIS 173 

L+K+ K KV +LL+LV L+D + YP+QLSGGQKQRVAIARALANDP+ILIS 
KNGKKRYLTKKEKITOKVTELLlOliVDLADLSDEYPAQLSGGQKQRVAIARALANDPEILIS 183 

DESTSALDPKTTKQIIALLQDLNKKLGLTIVLITHE^IVKDIAlffiVAVMQNGKLlEEGS 233 
DE TSALDPKTT QIL LL+ L++KLG+T+VLITHEMQ+VK+IAN+VAVMQNG++IE+ S 
DEGTSALDPKTTOQILDLLKSLHEKLGITVVLITHEMQWKEIANKVAVMQNGEIIEQNS 243 

VLDIFSHPRESLTQDFIKIATGIDEAMLKIEQQEWKNLPVGSKLVQLKYAGHSTDEPLL 293 
++DIF+ P+E+LT+ FI4 + ++ + + + E++ h +L+ L Y+G +4P++ 
LIDIFAQPKEALTKQFIETTSSraRFIASLSKTELLAQLADDEELIHLDYSGSELEDPW 303 

NQIYKEFEVTANILYGNIE ILDGI PVGEMWILSGDEE 331 
+ 1 K+F+VT NI YGN+E+L G P G +V+ L G E 

There is also homology to SEQ ID 76. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1826 

A DNA sequence (GBSxl933) was identified in S.agalactiae <SEQ ID 5669> which encodes the amino 
acid sequence <SEQ ID 5670>. This protein is predicted to be ABC transporter, permease protein. Analysis 
25 , of this protein sequence reveals the following: 
Possible site: 55 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-12.79 Transmembrane 203 - 219 ( 197 - 225) 
INTEGRAL Likelihood = -8.86 Transmembrane 73 - 89 ( 69 - 102) 
30 INTEGRAL Likelihood = -7.38 Transmembrane 38 - 54 ( 35 - 56) 

INTEGRAL Likelihood = -1.12 Transmembrane 103 - 119 ( 103 - 119) 

Final Results 

bacterial membrane Certainty=0 . 6116 (Affirmative) < suco 

35 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certaxnty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10083> which encodes amino acid sequence <SEQ ID 
10084> was also identified. 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59829 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 137/231 (59%), Positives = 171/231 (73%), Gaps = 1/231 (0%) 

Query: 1 MIEWIQTHLPNVYQMGWEGAYGWQTAIVQTLYMTFWSFLIGGLMGLLGGLFLVLTSPRGV 60 
45 M EW PNV +GW G GW TAIVQTLYMTF S LIGGL+GL+ G+ +V+T+ G+ 

Sbjct: 1 MAEWFAHTFPNVWLGWTGETGWWTAIVQTLYMTFISALIGGLLGLIFGIGvVVTAEDGI 60 

Query: 61 IANKLVFGvLDKVVSVFRALPFIILLALIAPVTRVTVGTTLGSPAALVPLSLAVFPFFAR 120 
N+ +F 4-LDK+VS+ RA PFIILLA IAP+T+++VGT +G AALVPL+L V PF+AR 
50 Sbjct: 61 TPMPLFWILDKIVSIGRAFPFIILIiAAIAPLTKILVGTQIGVTAALVPLALGVAPFYAR 120 

Query: 121 QVQWLAELDGGVIEAAQASGGTLWDI I -WYLREGLPDLIRVSTVTLISLVGETAMAGA 179 

QVQ L +D G +EARQ G D1+ VYLRE L LIRVSTVTLISL+G TAMAGA 
Sbjct: 121 QVQASLESvUHGKVEAAQlVGADFLDIVFTVYLREELASLIRVSTVTLISLIGLTAMAGA 180 

55 

Query: 180 IGAGGLGSVAITKGYNYSRDDITLVATILILLLIFFIQFLGDFLTRRLSHK 230 

IGAGGLG+ AI+ GYN +D+T ATILIL+ + +Q +GDFL RR+SH+ 
Sbjct: 181 IGAGGLGNTAISYGYNRFANDVTWFATILILIFVIiLVQLVGDFLARRVSHR 231 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 567 1> which encodes the amino acid 
sequence <SEQ ID 5672>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.15 Transmembrane 194 - 210 ( 187 - 215) 
INTEGRAL Likelihood =-10.67 Transmembrane 28 - 44 ( 20 - 52) 
INTEGRAL Likelihood = -8.12 Transmembrane 70 - 85 { 62 - 91) 

10 Final Results 

bacterial membrane Certair_ty=0 . 5458 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the databases: 

>GP:CAB59829 GB:AJ012388 hypothetical protein [Lactococcus lactis] 
Identities = 123/213 (57%), Positives = 153/213 (71%), Gaps = 1/213 (0%) 



Query: 


9 


Sbjct: 


19 


Query: 


69 


Sbjct: 


79 


Query: 


129 


Sbjct: 


139 




188 


Sbjct: 





AIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQWFSELDKGVIEAAQ 128 
A PF+IL+A +A T +L+ T +G TAALVPL PFYARQVQ +D G +EAAQ 

AFPFI ILLAAIAPLTKILVGTQIGVTAALVPLALGVAPFYARQVQASLESVDHGKVEAAQ 138 



GA F DIV WL E L LIRVSTVTLISL4G TAMAGAIGAGGLGN AISYGYNRF 



35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/212 (68%) , Positives = 172/212 (80%) 
40 Sbjct 



19 GAYGWQTAIVQTLYMTFWSFLIGGLMGLLGGLFLVLTSPRGVIANKLVFGVLDKWSVFR 78 

G GW AI TLYMT F++GG +GLL GL LVLT P GVI NK + V+DKV S+FR 
9 GDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDGVIENKTICWVIDICVTSIFR 68 

79 ALPFIILLALIAPVTRVIVGTTLGSPAALVPLSLAVFPFFARQVQWLAELDGGVIEAAQ 138 

A+PF+IL+A++A T +++ TTLG+ AALVPL+ A FPF+ARQVQW +ELD GVIEAAQ 
69 AIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQWFSELDKGVIEAAQ 128 

139 ASGGTLWDIIWYLREGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGSVAITKGYNYSR 198 

ASG T WDI+ VYL EGLPDLIRVSTVTLISLVGETAMAGAIGAGGLG+VAI + GYN 
129 ASGATFWDIVKOTLSEGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFN 188 



Sbjct 

45 

Query: 
Sbjct 

50 Query: 199 DDITLVATILILLLIFFIQFLGDFLTRRLSHK 230 

+D+T VATI + ILL+IF IQF+GD LTRR SHK 

Sbjct: 189 NDVTWVATI I ILLI I FAIQFIGDSLTRRFSHK 220 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1827 

A DNA sequence (GBSxl934) was identified in S.agalactiae <SEQ ID 5673> which encodes the amino 
acid sequence <SEQ ID 5674>. This protein is predicted to be alcohol dehydrogenase, zinc-containing (Zn- 
dependent). Analysis of this protein sequence reveals the following: 
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Possible site: 21 

»> Seems to have no N- terminal signal sequence 

Likelihood = -2.92 Transmembrane 71 - 87 ( 



Final Results 

bacterial membrane Certainty=0 . 2168 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9419> which encodes amino acid sequence <SEQ ID 9420> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF41759 GB:AE002488 alcohol dehydrogenase, zinc -containing 
[Neisseria meningitidis MC58] 
Identities = 135/246 (54%) , Positives = 186/246 (74%) , Gaps = 1/246 (0%) 

Query: 3 SHCEDGGWILGHLIEGTQAEYWPHAD3SLYHAPEGVCDDALVMLSDILPTSYEIGVLP 62 

SHC +GGWILG++I+GTQAEYV P+AD SL P+ V ++ ++LSD LPT++EIGV 
Sbjct: 102 SHCE^NGGWILGYMIDGTQAEYWTPYADNSLVPLPDNVNEEIALLLSDALPTAHEIGVQY 161 

Query: 63 SHIKPGDTVCIVGAGPIGLSALLTAQFYSPAKIIMVDLSQKRLEASKKFGATHTILSTST 122 

+KPGDTV I GAGP+G+SALLTAQ YSPA 11+ D+ + RL+ +K+ GATHTI + ++ 
Sbjct: 162 GDWPGDTVFIAGAGPVGMSALLTAQLYSPAAIIVCDMDENRLKLAKELGATHTI-NPAS 220 

Query: 123 QEVKEEIDKITKGRGVDWLECVGYPATFDICQNWSIGGHIANVGVHGKPVEFNLQDLW 182 

EV +++ I GVD +E VG PAT+++CQ++V GGHIA VGVHG+ V+F L+ LW 
Sbjct: 221 GEVSKQVFAIVGEDGVDC^IEAVGIPATWNMCQDIVKPGGHIAWGVHGQSVDFKLEKLW 280 

Query: 183 IKNITLNTGLW2WTTEMLLEVLETGKIDATQLWHHFKLSEIEEAYKVFKAAEENNTLK 242 

IK + + TGLVNANTTEML++ + + +D T+++THHFK SE+E+AY VFK A EN +K 
Sbjct: 281 IKKLAITTGLVNANTTEMLMKAISSSSVDYTKMLTHHFKFSEXjEKAYDVFKHAAENQVMK 340 

Query: 243 VIIEND 248 

V++E D 
Sbjct: 341 WLEAD 346 

A related DNA sequence was identified in S.pyogenes <SEQ ID 785> which encodes the amino acid 
sequence <SEQ ID 786>. Analysis of this protein sequence reveals the following: 
Possible site: 23 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.41 Transmembrane 184 - 200 ( 183 - 203) 

Final Results 

bacterial membrane --- Certainty=0. 3166 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/250 (79%) , Positives = 226/250 (89%) 

Query: 1 MPSHCEDGGWILGHLIEGTQAEYVHIPHADGSLYHAPEGVCDDALVMLSDILPTSYEIGV 60 

+ SHC+DGGWILGHLI GTQAEYVHIPHADGSLYHAP+ + D+ALVMLSDILPTSYEIGV 
Sbjct: 114 LSSHCQDGGWILGHLINGTQAEYVHIPHADGSLYHAPDTIDDEALVMLSDILPTSYEIGV 173 

Query: 61 LPSHIKPGDTVCIVGAGPIGLSALLTAQFYSPAKIIMVDLSQKRLEASKKFGATHTILST 120 

LPSH+KPGD VCIVGAGP+GL+ALLT QF+SPA IIMVDLSQ RLEA+K FGATHTI S 
Sbjct: 174 LPSHWPGDNVCIVGAGPVGIAALLWQFFSPANIIMVDLSQNRLEAAKTFGATHTICSG 233 

Query: 121 STQEVKEEIDKITKGRGVDWLECVGYPATFDICQNWSIGGHIANVGVHGKPVEFNLQD 180 

S++EVK ID IT GRGVD+ +ECVGYPATFDICQ ++S+GGHIANVGVHGKPV+FNL + 
Sbjct: 234 SSEEVKAIIDDITNGRGVDISMECWGYPATFDICQKIISVGGHIANVGVHGKPVDFNLDE 293 
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Query: 181 LWIK3STITIOTGLWAITrTEmLEVLETGKIDATQLVTHHFKLSEIEEAYKyFKAAEEtOT 240 

LWIKNITLNTGLWANTTEMLL VL+TGKIDRT+L+THHFKLSE+E+AY+ FK A NN 
Sbjct: 294 LWI KNITIjNTGLVNANTTEMLLNVLKTGKI DATRL I THH FKLSEVEKAYETFKHAGANNA 353 

5 Query: 241 LKVIIENDIT 250 

LKVTI+NDI+ 
Sbjct: 354 LKVIIDNDIS 363 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1828 

A DNA sequence (GBSxl935) was identified in S.agalactiae <SEQ ID 5675> which encodes the amino 
acid sequence <SEQ ID 5676>. This protein is predicted to be a dehydrogenase fragment. Analysis of this 
protein sequence reveals the following: 

15 Possible site: 20 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.46 Transmembrane 47 - 63 { 33 - 66) 

Final Results 

20 bacterial membrane Certainty=0 . 5182 (Affirmative) < suco 

bacterial outside Certainty=Q . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The, protein has no significant homology with any sequences in the GENPEPT database. 

25 There is also homology to SEQ ID 786: 

Identities = 23/38 (60%) , Positives = 28/38 (73%) 

Query: 7 WRNSNMRAATYLSANELSLTDKAKPQVI KFTDAWXLV 44 
++ NM+AATYLS L L DK KP +IKPTDA+V LV 
30 Sbjct: 10 YKKLNMKAATYLSTGNLQLIDKPKPVIIKPTDAIVQLV 47 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1829 

35 A DNA sequence (GBSxl936) was identified in S.agalactiae <SEQ ID 5677> which encodes the amino 
acid sequence <SEQ ID 5678>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 1001 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1830 

A DNA sequence (GBSxl937) was identified in S.agalactiae <SEQ ID 5679> which encodes the amino 
acid sequence <SEQ ID 5680>. This protein is predicted to be branched chain amino acid transport system 
II carrier protein (brnQ). Analysis of this protein sequence reveals the following: 

5 Possible site: 44 



> Seems to have an uncleavable N-term signal seq 








INTEGRAL 


Likelihood = -9.66 Transmembrane 158 


174 


154 


- 177) 


INTEGRAL 


Likelihood = -6.64 Transmembrane 233 


249 


231 


- 252) 


INTEGRAL 


Likelihood = -5.20 Transmembrane 37 


53 


30 


- 57) 


INTEGRAL 


Likelihood = -3.98 Transmembrane 90 


105 


87 


- 108) 


INTEGRAL 


Likelihood = -0.80 Transmembrane 130 


146 


130 


- 146) 



Final Results 

bacterial membrane Certainty=0 . 4864 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9417> which encodes amino acid sequence <SEQ ID 9418> 
was also identified. 



20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00400 GB:AF008220 branch-chain amino acid transporter 
[Bacillus subtilis] 
Identities = 89/250 (35%), Positives = 139/250 (55%), Gaps = 18/250 (7%) 

25 Query: 1 MDALAS IAFAI IVIQASKQYGAITKKEITSMALKSGAIATFLLAFIYI FVGRIGATSQSL 60 

MDALASI F ++V+ A K G K + + +K+G IA L FIY+ + +GATS + 
Sbjct: 199 MDAIASIVFGVVVWAWSKGVTQSKALAAACIKAGVIAALGLTFIYVSLAYLGATSTNA 258 



Query: 61 FKFANGSFLLHNTPI-DGGHVLSQSANFYLGIVGQAILGTAIFLACLTTATGLITACAEY 119 

P+ +G +LS S+++ G +G +LG AI +ACLTT+ GL+T+C +Y 
Sbjct: 259 IG PVGEGAKILSASSHYLFGSLGNIVLGAAITVACLTTSIGLVTSCGQY 307 

Query: 120 FHKLLPKISHITWATIFTLIAITFYFGGLSEIIRWSLPVLYLLYPLTIVLIFLVFFDQKF 179 

F KL+P +S+ TI TL ++ GL++II +S+P+L +YPL IV+I L F D+ F 

Sbjct: 308 FSKLIPALSYKIWTIVTLFSLIIANFGLAQIIAFSVPILSAIYPLAIVIIVLSFIDKIF 367 

Query: 180 ESSRIVYQTSIAATAVAALYDALSKLGEWXGLFTIPSALTTFFTIC\'VPLGEYSMGWISFA 239 

+ R VY + T + ++ D + G G +L F +PL +GW+ 

Sbjct: 368 KERREVYIACLIGTGLFSILDGIKAAGFSLG SLDVFLNANLPLYSLGIGWVLPG 421 

Query: 240 ICGVLVGLIL 249 

I G 4+G +L 
Sbjct: 422 IVGAVIGYVL 431 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 2233> which encodes the amino acid 
sequence <SEQ ID 2234>. Analysis of this protein sequence reveals the following: 

Possible e 



50 



.e: 21 





have a cleavable N-te 


-m signal seq. 










INTEGRAL 


Likelihood =- 


10 


83 


Transmembrane 


235 


251 


228 


258 


INTEGRAL 


Likelihood = 


-8 


49 


Transmembrane 


434 


450 


429 


454 


INTEGRAL 


Likelihood = 


-8 


12 


Transmembrane 


359 


375 


356 




INTEGRAL 


Likelihood = 


-7 




Transmembrane 


150 


166 


144 




INTEGRAL 


Likelihood = 


-6 


00 


Transmembrane 


298 


314 


288 


316 


INTEGRAL 


Likelihood = 


-5 


95 


Transmembrane 


42 


58 


38 




INTEGRAL 


Likelihood = 


-3 


35 


Transmembrane 


336 


352 


335 


354 


INTEGRAL 


Likelihood = 


-2 


81 


Transmembrane 




215 


198 


218 


INTEGRAL 


Likelihood = 


-2 




Transmembrane 


120 


136 


120 


138 


INTEGRAL 


Likelihood = 




SI 


Transmembrane 


390 


406 


390 


407 


INTEGRAL 


Likelihood = 


-1 


01 


Transmembrane 


81 


97 


81 


97 
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Final Results 

bacterial membrane Certainty=0. 5331 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 161/253 (63%) , Positives = 197/253 (77%) 

Query: 1 MDMiRSIAFAIIVIQASKQYGAITKKEITSMALKSGAIATFLIAFIYIFVGRIGATSQSL 60 

MDALAS+ FAI+VI+A+KQ+GA T KE+T + L SGAIA LLA +YIFVGRIGATSQSL 
Sbjct: 202 MDALASLVFAI LVI EATKQFGAKTDKEMTKITLI SGAIAILLLALVYI FVGRIGATSQSL 261 

Query: 61 FKFANGSFLLHNTPIDGGHVLSQSANFYLGIVGQAILGTAIFIACLTTATGLITACAEyF 120 

F F +GSF LH P++GG +LS ++ FYLG +GQA L IFLACLTT+TGLIT+ AEYF 
Sbjct: 262 FPFIDGSFTLHGNPvNGGQILSHASRFYLGGIGQAFIAWIFIACLTTSTGLITSSAEYF 321 

Query: 121 HKLLPKISHITWATIFTLIAITFYFGGLSEIIRWSLPVTYLLYPLriVLIFLVFFDQKFE 180 

HKL+P +SHI WATIFTL++ FYFGGLS II WS PVL+LLYPLT+ LIFLV + F 
Sbjct: 322 HKLVPALSHIAWATIFTLLSAFFYFGGLSVIINWSAPVLFLLYPLTVDLIFLVLAQKCFN 381 

Query: 181 SSRIVYQTSIAATAVAALYDALSKLGEMTGLFTIPSALTTFFTKWPLGEYSMGWISFAI 240 

+ IVY+T+I T + A++DAL L +MTGLF +P A+ TFF K VPLG++SMGWI FA 
Sbjct: 382 NDPIVYRTTIGLTFI PAI FDALLTLSQMTGLFHIiPEAVvTFFQKTVPLGQFSMGWI I FAA 441 

Query: 241 CGVLVGLILKKVK 253 

G L+GLIL K K 
Sbjct: 442 IGFLIGLILSKTK 454 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1831 

A DNA sequence (GBSxl938) was identified in S.agalactiae <SEQ ED 5681> which encodes the amino 
acid sequence <SEQ ID 5682>. This protein is predicted to be 30S ribosomal protein S12 (rpsL). Analysis 
of this protein sequence reveals the following: 

N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3698 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9429> which encodes amino acid sequence <SEQ ID 943 0> 
was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MPTINQLTOKPRKSKVEKSDSPALNIGYNSHRKVHTKLSAPQKRGVATRVGTMTPKKPNS 60 
50 MPTINQLVRKPRKSKVEKS SPALN+GYNSH+KV T +S+PQKRGVATRVGTMTPKKPNS 

Sbjct: 1 MPTINQLTOKPRKSKVEKSKBPALNVGYNSHKKVQTNVSSPQKRGVATRVGTMTPKKPNS 60 



Query: 61 ALRKFARVRLS 71 

ALRKFARVRLS 
Sbjct: 61 ALRKFARVRLS 71 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5683> which encodes the amino acid 
sequence <SEQ ID 5684>. Analysis of this protein sequence reveals the following: 

d N-terminal signal 



Final Results 

bacterial cytoplasm Certainty=0 .3879 (Affirmative) < suco 

bacterial membrane --- Certainty=0.0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 44/48 (91%) , Positives = 47/48 (97%) 

Query: 24 LNIGYNSHRKVHTKLSAPQKRC3VATRVGTKTPKKPNSALRKFARWLS 71 

LNIGYNSH+KV TK++APQKRGVATRVGTKTPKKPNSALRKFARVRLS 
Sbjct: 1 LNIGTNSHKKVQTKMAAPQKRGVATRVGTKTPKKPNSALRKFARVRLS 48 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1832 

A DNA sequence (GBSxl939) was identified in S.agalactiae <SEQ ID 5685> which encodes the amino 
acid sequence <SEQ ID 5686>. This protein is predicted to be purR. Analysis of this protein sequence 
reveals the following: 

Possible site: 30 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.37 Transmembrane 142 - 158 ( 142 - 159) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



++R+ER+V +N+LIN+P + +LN + Y AKSSISED+ IK+ FE +G ++T 



G+ GGV FTP I + + E+ +E+ + L E +RILPGGYIYLSD+L TP L+ IG+II 



Query: 


3 


Sbjct: 


1 




63 


Sbjct: 


61 


Query: 


123 


Sbj ct : 








Sbjct: 


181 




242 


Sbjct: 


241 



YKS+L+V ID+ N 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5687> which encodes the amino acid 
sequence <SEQ ID 5688>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.97 Transmembrane 142 - 158 ( 142 - 160) 

Final Results 

bacterial membrane Certainty=0. 1786 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



LRRSERMWISireLINNPYKLTSLNTFATKYEAAKSSISEDIAIIKKAFEEANIGDIDTL 62 
++R+ER+V +N+LIN+P ++ +LN + YE AKSSISED+ IK+ FE +G ++T 
MKRNERLVDFTNFLINHPNQMLNLNELSKHYEVAKSSISEDLVFIKRVFENQGVGLVETF 60 



Query: 


3 


Sbjct: 


1 




63 


Sbj ct : 


61 




123 


Sbj ct : 


121 




183 


Sbjct: 


181 




242 


Sbjct: 


241 



h +++ + L E +RILPGGYIYLSD+L TP L+ IG+II 



+++D VMT+ATKG+P+A +VA IL VPFVIVRRD K+TEG+T++VNY E 



+E M LSKRSL VLIVDDF+KG GTI GM SL+ EFD L GVAVF E 



+KS+LKV ID+- N ++ V++GNIF+ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/270 (85%), Positives = 255/270 (93%) 



Query: : 

MKLRRSERMWI SNYLINNPY LTSLNTFA+KY AAKSSISEDIAIIKKAFE+A IGDI 
Sbjct: 1 MKLRRSERMVVISNYLINNPYKLTSIjNTFATKYEAAKSSISEDIAIIKKAFEEANIGDID 60 

Query: 61 TVTGASGGVIFTPTrAEAEAKEIVEELRQRLSENDRILPGGYIYLSDLLSTPKMLQSIGR 120 

T+TGASGGVIFTP+I+E EA+ IVE+L QRLSE+DRILPGGYIYLSDLLSTPK+LQ+IGR 
Sbjct: 61 TLTGASGGVIFTPSISETEARTIVEDLCQRLSESDRILPGGYIYLSDLLSTPKILQNIGR 120 

Query: 121 IIANAYRGQKIDAVMWATKGVPIANAVANVLDVPFVIVRRDLKITEGSTVSvNYASGSS 180 

I IANA++G+KIDAVMTVATKGVPLANAVAN+L VPFVIvRRDLKITEGSTVSVNYAS SS 
Sbjct: 121 IIANAFKGEKIDAV>TrvATKGVPIANAV7ANILSVPFVIVRRDLKITEGSTVSVNYASASS 180 

Query: 181 GRIEKMFLSKRSLKPNSRVLIVDDFLKGGGTVSGMISLLSEFDSTLVGVAVFAENAQEQR 240 

RIEKMFLSKRSLKPNSRVLIVDDFLKGGGT++GMI SLL+EFDSTLVGVAVFAENAQ +R 
Sbjct: 181 DRIEKMFLSKRSLKPNSRVLIVDDFLKGGGTITGMISLLTEFDSTLVGVAVFAENAQSER 240 

Query: 241 EKMAYKSLLRVSE ID VKNNRVSVEAGNI FD 270 

E+M +KSLL+VSEIDVKNN V VE GNIFD 
Sbjct: 241 EQMTFKSLLKVSEIDVKNNNVVVEVGNIFD 270 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1833 

A DNA sequence (GBSxl940) was identified in S.agalactiae <SEQ ID 5689> which encodes the amino 
acid sequence <SEQ ID 5690>. This protein is predicted to be cmp-binding-factor 1. Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1753 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Wot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44803 GB:U21636 cmp-binding-factor 1 [Staphylococcus aureus] 

= 140/310 (45%) , Positives = 195/310 (62%) , Gaps = 6/310 (1%) 



Identities 




3 ] 

■] 


Sbjct: 


4 ] 


Query: 


63 I 


Sbjct: 


64 I 




121 ] 
] 


Sbjct: 


122 1 


Query: 


181 I 
I 


Sbjct: 


182 I 


Query: 


241 1 


Sbjct: 


240 } 


Query: 


301 1 


Sbjct: 


300 C 



Y G Q VNQI I 



IENA QR+ R L +KY + F+TYPAA 4+HH F SGL+YH TM+R+A SI DIYP LN 



KSL+++GI+LHD+ KV ELSGP T YT+ GNL+GHIS+ +E+ + E!LNI+ EE+ 



+BRH+ILSHHG+LEYGSP P + EAEI+ IDNIDA MM A + ++G+ T++IF 



++NR FY P 

tLENRRFYNP 309 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5691> which encodes the amino acid 
sequence <SEQ ID 5692>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N-terminal signal sequence 



• Final Results 

bacterial cytoplasm Certainty=0 . 1822 (Af f in 



bacterial membrane Certainty=0. 0000 (Nor. Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 275/311 (88%) , Positives = 300/311 (96%) 

Query: 1 MKINQMKKDELFEGFYLI KKAEVRKTRAGKDFIAFTFQDDTGE I SGNMWTJAQTYNVEEFV 60 

MKINQMKKD+LFEGFYLIK AEVRKTRAGKDFI+ TFQDDTGE I SGN+WDAQ YNVEEF 
Sbjct: 1 MKINQMKKDQLFEGFYLIKSAEWKTRAGKDFISLTFQDDTGEISGNLWDAQPYNVEEFT 60 

Query: 61 AGKIVHMKGRREVYNGTPQVNQITLRNIKDGEPNDPRDFKEKPPINVDNVREYMEQMLFK 120 

AGK+V MKGRRE VYNGTPQVNQITLRN+ + -GEPMDP+DFKEK P++V VR+Y+EQMLFK 
Sbjct: 61 AGKVVFMKGRREVYNGTPQvNQITLRlTOPGEPNDPI<DFKEKAPVSVTEVRDYLEQMIjFK 120 
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Query': 121 IENATWQRVVRiUjYRKYNKEFFTYPAAKTJfflHAFESGLAYHTATMWIJUDSIGDIYPEIjN 180 

IENATWQR+VE^ALYRKY+KEF+TYPAAKTNHHAFESGIAYHTATMVRLMDSIGDIYP+LN 
Sbjct: 121 IENATWQRIVRALYRKYDKEFYTYPAAmSIHH^ 180 

Query: 181 KSLMFAGIMLHDLAKVIELSGPDNTEYTIRGNLIGHISLIDEELTKILAELWIDDTKEEV 240 

KSL+FAGIMLHDLAKVIEL+GPDNTEYT+RGNLIGHISLI+EE+TK+++EL IDDTKEEV 
Sbjct: 181 KSLLFAGIMLHDLAKVIELTGPDNTEYTVR3NLIGHISLINEEITKVISELQIDDTKEEV 240 

Query: 241 TVLRHVILSHHGQLEYGSPTOPRIMEAEIIHMIDNIDAOT^M^TTAIffiVlffiGEMTORIF 300 

VLRHVILSHHGQLEYGSPVRPRIMEAEIIHMIDNIDAM1MMMTTAL+RV+EGEMTNRIF 
Sbjct: 241 I VLRHVILSHHGQLEYGS PVRPRI MEAE I IHMIDNIDANMMMMTTALSRVSEGEMTHRI F 300 

Query: 301 AMDNRSFYKPN 311 

AMDNRSFYKPN 
Sbjct: 301 AMDNRSFYKPN 311 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1834 

A DNA sequence (GBSxl941) was identified in S.agalactiae <SEQ ID 5693> which encodes the amino 
acid sequence <SEQ ID 5694>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.59 Transmembrane 2 - 18 ( 1-22) 

Final Results 

bacterial membrane Certainty=0 . 6838 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5695> which encodes the amino acid 
sequence <SEQ ID 5696>. Analysis of this protein sequence reveals the following: 

Possible site: 17 



19 ( 1 - 26) 

Final Results 

bacterial membrane Certainty=D . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 309/424 (72%) , Positives = 370/424 (86%) , Gaps = 3/424 (0%) 

Query: 1 MLVIILIIVLASLTVTIISYQKMTELTKSVEKQLEDNADNLSDQLTYQIEVAQKDQILTL 60 

+++ +L++VL L ++ K+ L + + LE NADNLSDQ+TYQ++ A K Q+L L 
Sbjct: 3 LILFLLVLVLLGLGAYLLF - - KVNGLQHQLAQTLEGNADNLSDQMTYQLDTANKQQLLEL 60 

Query: 61 TNQLHRMQQEIYQLLTDMRTELNQHLTESRDRSDKRLELINSNLSQSVQKMQDSNEKRLD 120 

T +NR Q +YQ LTD+R L++ L++SRDRSDKRLE IN ++QS++ MQ+SNEKRL+ 
Sbjct: 61 TQLMNRQQAGLYQQLTDIRDVLHRSLSDSRDRSDKRLEKINQQVNQSLKNMQESNEKRLE 120 

Query: 121 Q^QTVEEKLEKTLQTRLQTSFETVSRQLESWQGI^EMKTVAQDVGTMKVLSNTKTRG 180 

+MRQ VEEKLE+TL+ RL SF++VS+QLESVN+GLGEM++VAQDVGTLNKVLSNTKTRG 
Sbjct: 121 KmQIVEEKLEETLECNRLHASFDSVSKQLESWKGLGEMRSVAQDVGTLNKVLSNTKTRG 180 

Query: 181 ILGELQLGQIIEDIMTVSQYEREFPTVSGSSERVEYAIKLPGNGQGDYIYLPIDSKFPLE 240 

ILGELQLGQI IEDIMT SQYEREF TVSGS SERVEYAI KLPGNGQG YIYLPIDSKFPLE 
Sbjct: 181 ILGELQLGQIIEDIMTSSQYEREFVTVSGSSERVEYAIKLPGNGQGGYIYLPIDSKFPLE 240 
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Query: 241 DYYRLEDAYELGDKVQIELYRKSLLASIRKFAKDINNKYLNPPETTNFGIMFLPTEGLYS 300 

DYYRLEDAYE+GDK+ IE RK+LLA+I++FAKDI+ KYLNPPETTNFG+MFrjPTEGLYS 
Sbjct: 241 DYYRLEDAYEVGDKIAIEASRKALLAAIKRFAKDIHKK^DNPPETTHFGVMFLPTEGLYS 300 

Query: 301 EVVRNATFFDSLRRDENIWAGPSTLSALmELSVGFKTIiNIQKMACIDISKILGNVKVEF 360 

EWRNA+FFDSLRR+ENIWAGPSTLSAIjLNSLSVGFKTLNIQKNA+DISKILGNVK+EF 
Sbjct: 301 EWRNASFFDSLRREENIWAGPSTLSALIJ'JSLSVGFICrLNlQKNADDISKILGNVKLEF 3 60 

10 Query: 361 GKFGGMLSKAQKQLOTASKSIDSLLTTRTNAIIRVLNTVEEHQDQATTSLLNLPITEEEE 420 

KFGG+L+KAQKQ+NTA+ ++D L++TRTNAI+R IOTVE +QDQAT SLLN+P+ EEE 
Sbjct: 361 DKFGGLlJUCAQKQ^MANOTLDQLISTRImIWAI J ^m7ETYQDQATKSLIil^PLLEEEH 420 

Query: 421 INEN 424 
15 NEN 

Sbjct: 421 -NEN 423 



SEQ ID 5694 (GBS88) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 2; MW 48kDa). 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1835 

A DNA sequence (GBSxl942) was identified in S.agalacliae <SEQ ID 5697> which encodes the amino 

acid sequence <SEQ ID 5698>. Analysis of this protein sequence reveals the following: 

25 Possible site: 44 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2722 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13453 GB:Z99112 yloS [Bacillus subtilis] 
Identities = 75/217 (34%) , Positives = 109/217 (49%) , Gaps = 12/217 (5%) 

Query: 1 MTKIALFAGG DLTYFEYDFDYFVGIDRGSL.FLLKNGLSLDMAVGDFDSITEDEL 54 

M I + AGG DLT + + ++G+D+G++ LL G+ A GDFDSITE E 

Sbjct: 1 MKT INIVAGGPKNLI PDLTGYTDEHTLWIGVDKGTvTLLDAGI I PVEAFGDFDS ITEQER 60 

Query: 55 LYIKHYCSNIVSASAEKNDTDTEIALKTIFKEFPEAQVTVFGAFGGRIDHMMSNIFLPSD 114 

1+ + AEK+ TD +LAL ++ P+ + +FG GGR DH + NI L 

Sbjct: 61 RRIEKAAPALHVYQAEKDQTDLDIALDWALEKQPDI-IQIFGITGGRADHFLGNIQLLYK 119 

- Query: 115 RDLEPFMSQIRLKDEQNIVTYLPSGKNQVSRIEGMSYVSFMPESES--TLQISGAKYELN 172 

+ IRL D+QN + P G+ + + E Y+SF+P SE L ++G KY LN 
' Sbjct: 120 GVKTNI--KIRLIDKQNHIQMFPPGEYEIEKDENKRYISFIPFSEDIHELTLTGFKYPLN 177 

Query: 173 KSNY-FKKKMYSSNEFMTSPIEVELKDGYLIIIYSro 208 

+ + SNE + S G LI+I S D 

Sbjct: 178 NCHITLGSTLCISNELIHSRGTFSFAKGILIMIRSTD 214 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5699> which encodes the amino acid 
sequence <SEQ ID 5700>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal 
• Final Results 
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bacterial cytoplasm --- Certainty=0 .2467 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/208 (62%) , Positives = 166/208 (79%) 

Query: 1 MTKIALFAGGDLTYFEYDFDYFVGIDRGSLFLLKNGLSLDMA.VGDFDSITEDELLYIKHY 60 

M+K+ALFAGGDL+Y DFDYFVGIDRGSLFLL+NGL L+MAVGDFDS + + + IK 
Sbjct: 1 MSKVALFAGGDLSYISRDFDYFVGIDRGSLFLLENGLPIiNMAVGDFDSVSQKAFTDIKEK 60 

Query: 61 CSNIVSASAEKNDTDTELALKTIFKEFPEAQVTVFGAFGGRIDHMMSNIFLPSDRDLEPF 120 

++A EKNDTDTELALK +F FPEA+VT+FGAFGGR+DH++SNIFLPSD + PF 
Sbjct: 61 AELFITAHPEKNDTDTELALKEVFARFPEAEVTIFGAFGGRMDHLLSNIFLPSDPGIAPF 120 

Query: 121 MSQIRLKDEQNIVTYLPSGKNQVSRIEGMSWSFMPESESTLQISGAKYEIaNKSNYFKKK 180 

M+QI L+D+QN++TY P+G++ + + EGM+YV+FM E E+ L I+GAK+EL + N+FKKK 
Sbjct: 121 MAQIALRDQQNMITYRPAGQHLIHQEEGMTYVAFMAEGEADLTITGAKFELTQDNFFKKK 180 

Query: 181 MYSSNEFMTSPIEVELKDGYLIIIYSKD 208 

+YSSN F+ PI V L GYLIII SKD 
Sbjct: 181 IYSSNAFIHQPITVSLPSGYLIIIQSKD 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1836 

A DNA sequence (GBSxl943) was identified in S.agalactiae <SEQ ID 570 1> which encodes the amino 
acid sequence <SEQ ID 5702>. This protein is predicted to be ribulose-phosphate 3-epimerase (rpe). 
Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 124 - 140 ( 124 - 141) 



Final Results 

bacterial membrane Certainty=0 . 1633 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) c suco 

The protein has homology with the following sequences in the GENPEPT database. 



KIAPSILAADYANFANELKRIEETTAEYVHIDIMDGQFVPNISFGADWSSMRKHSKLVF 64 
KIAPSIL+AD+AN NE++ +E A+Y+H+D+MDG FVPNI + G +V ++R + L 
KIAPSirjSADFANLGNEIQDvERGGADYIHVDVMDGHFVPNITIGPLIVDAIRPVTTLPL 62 

DCHLMVVDPERYIEAFAQAGADIMTIHVEATKHIHGALQKIKEAGMKAGVVINPGTPVES 124 
D HLM+ P+ YI AFA+AGAD I +T+HVEA H+H L IKE+G+KAGW+NP TPV S 
DvHLMIEQPDGYIPAFAKAGADIITVHVEACPHLHRTLHLIKESGVKAGVVLNPATPVSS 122 



+L VD +L MTVNPGFGGQ FIP ++ K+K +A+ +KE G ++IEVDGG++ T 





5 


Sbj ct : 


3 




65 


Sbjct: 


63 




125 


Sbjct: 


123 






Sbjct: 


183 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5703> which encodes the amino acid 
sequence <SEQ ID 5704>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0072 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 183/219 (83%) , Positives = 198/219 (89%) 

Query: 1 MSTNKIAPSILAADYANFANELKRIEETTAEYvHIDIMDGQPVPNISFGaDWSSMRKHS 60 

MST KIAPSILAADYANFA+EL RIEET AEYVHIDIMDGQFVPNISFGADW+SMRKHS 
Sbjct: 1 MSTLKIAPSILAADYANFASELARIEETDAEYVHIDIMDGQFVPNISFGADWASMRKHS 60 

Query: 61 KLVFDCHLMVVDPERYIEAFAQAGADIMTIHVEATKHIHGALQKIKEAGMKAGWINPGT 120 

KLVFDCHLMWDPERY+EAFAQAGADIMTIH E+T+HIHGALQKIK AGMKAGWINPGT 
Sbjct: 61 KLVFDCHLMVVDPERYVEAFAQAGADIMTIHTESTRHIHGALQKIKAAGMKAGWINPGT 120 

Query: 121 PvESLIPILDLVDQILIMTVNPGFGGQAFIPEMMSKVKTVAAWRKEYGHHYDIEVDGGlD 180 

P +L P+LDLVDQ+LIMTVNPGFGGQAFIPE + KV TVA WR E G +DIEVDGG+D 
Sbjct: 121 PATALEPLLDLVDQvliIMTWPGFGGQAFIPECLEKVATVAKWRDEKGLSFDIEVDGGVD 180 

Query: 181 NTTIKAAAEAGANVFVAGSYLFKASDLPAQVETLRVALD 219 

N TI+A EAGANVFVAGSYLFKASDL +QV+TLR AL+ 
Sbjct: 181 NKTIRACYEAGANVFVAGSYLFKASDLVSQVQTLRTALN 219 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1837 

A DNA sequence (GBSxl944) was identified in S.agalactiae <SEQ ID 5705> which encodes the amino 
acid sequence <SEQ ID 5706>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2098 (Affirmative) < suco 

bacterial membrane Certainty=o. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13451 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 148/296 (50%) , Positives = 202/296 (68%) , Gaps = 14/296 (4%) 

Query: 2 QGRIVKSLAGFYYV ESDGVVYQTRARGNFRKKGQIPYVGDWVEFSSQDQSEGYILS 57 

+G+I+K+L+GFYYV E V Q R RG FRK P VGD+V + +++ EGY++ 
Sbjct: 3 EGKI I KALSGFYYVLDESEDSDKVI QCRGRGI FRKNKITPLVGDYWYQAENDKEGYLME 62 

Query: 58 IEERKNSLVRPPIVNIDQAWIMSAKEPDFNANLLDRFLVLLEYKMIQPIIYISKLDIiLD 117 

I+ER N L+RPPI N+DQAV++ SA +P F+ LLDRFLVL+E IQPII I+K+DL++ 
Sbjct: 63 IKERTNELIRPPICNVDQAVLVFSAVQFSFSTALLDRFLVLVEANDIQPIICITKMDLIE 122 

Query: 118 DLWIDDIR EHYQNIGY-VFCYSQEE LLPLIANKVTVFMGQTGVGKSTLLN 167 

D D 1+ E Y+NIGY V+ S ++ ++P +K TVF GQ+GVGKS+LLN 

Sbjct: 123 DQDTEDTIQAYAEDYRNIGYDVYLTSSKDQDSLADIIPHFQDKTTVFAGQSGVGKSSLIiN 182 



Query: 168 KIAPELKLETGEISGSLGRGRHTTRAVSFYNVHKGKIADTPGFSSLDYEVDNAEDLNESF 227 
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I+PEL L T EIS LGRG+HTTR V + G +ADTPGFSSL+4 E+L +F 

Sbjct: 183 AI S PELGLRTNEI SEHLGRGKHTTRHVEL IHTSGGLVADTPGFSSLEFTD I EEEELGYTF 242 

Query: 228 PELRRLSHFCKFRSCTHTHEPKCAVKEALTQGQLWQVRYDNYLQFLSEIESRRETY 283 
5 P++R S CKFR C H EPKCAVK+A+ G+L Q RYD+Y++F+4EI+ R+ Y 

Sbjct: 243 PDIREKSSSCKFRGCLHLKEPKCAVKQAVED3ELKQYRYDHYVEFMTEIKDRKPRY 298 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5707> which encodes the amino acid 
sequence <SEQ ID 5708>. Analysis of this protein sequence reveals the following: 

10 Possible site: 17 

»> Seems to have no N-terminal signal sequence 

----- Final Results 

bacterial cytoplasm Certainty=0 . 2290 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 230/290 (79%) , Positives = 257/290 (88%) 

Query: 1 MQGRIVKSIAGFYYVESDGWYQTRARGNFRKKGQIPYVGDWVEFSSQDQSEGYILSIEE 60 

+QG+I+KSLAGFYYVES+G VYQTRARGNFRK+G+ PYVGD V+FS++D SEGYIL+I 
Sbjct: 1 LQGKIIKSLAGFYYVESEGQVYQTRARGNFRKRGETPYVGDIVDFSAEDNSEGYILAIHP 60 

Query: 61 RKNSLWPPIVNIDQAWIMSAKEPDFNANLLDRFLVLLEYKMIQPIIYISKLDLLDDLV 120 

RKNSLVRPPIVNIDQAWIMSAKEP+FN+NLLDRFL+LLE+K I P++YISK+DLLD 
Sbjct: 61 RKNSLTOPPIWIDQAWIMSAKBPEFNSNLLDRFLILLEHKAIHPVVYISKMDLLDSPE 120 



Query: 121 VIDDIREHYQNIGYVFCYSQEELLPLLANKVTVFMGQTGVGKSTLLNKIAPELKLETGEI 180 

I I YQ IGY F S EELLPLLA+K+TVFMGQTGVGKSTLLN+IAPEL LE GEI 
Sbjct: 121 EIKAIGRQYQAIGYDFVTSLEELLPLIADKITVFMGQTGVGKSTLLNRIAPELALEIGEI 180 

Query: 181 SGSLGRGRHTTRAVSFYNVHKGKIADTPGFSSLDYEVDNAEDLNESFPELRRLSHFCKFR 240 

S SLGRGRHTTRAVSFYN H GKIADTPGFSSLDY++ NAEDLNE4FPELRRL5H CKFR 
Sbjct: 181 SDSLGRGRHTTRAVSFYNTHGGKIADTPGFSSLDYDIANAEDLNEaFPELRRLSHECKFR 240 



Query: 241 SCTHTHEPKCAVKEALTQGQLWQVRYDNYLQFLSEIESRRETYKKVIKRK 290 

SCTHTHEPKCAVK AL G+LW VRY++YLQFLSEIE+RRETYKKVIKRK 
Sbjct: 241 SCTHTHEPKCAVKAMjETGELWPVRYEHYLQFLSEIENRRETYKKVIKEK 290 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1838 

A DNA sequence (GBSxl945) was identified in S.agalactiae <SEQ ID 5709> which encodes the amino 
45 acid sequence <SEQ ID 571 0>. This protein is predicted to be rRNA. Analysis of this protein sequence 
reveals the following: 
Possible site: 17 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 259 - 275 ( 259 - 275) 

50 

Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

55 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15937 GB:Z99124 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 95/278 (34%) , Positives = 147/278 (52%) , Gaps = 16/278 (5%) 
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Query: 14 SYFACPKCQNPLIKESN-SLKCSDN-HCFDLSKFGYVNLLGGKKVDEHYDKKSFENR-QL 70 

S F CP C + + S SL C++ H FDLS+ GYVN L K V Y + FE R +L 
Sbjct: 8 SMFRCPLCDSSMDAASGKSLICTERGHTFDLSRHGYVNFLT-KPVKTSYGAELFEaRSRL 66 

Query: 71 VLENGYYNHILEAISKVLENNSQFH SVLDIGCGEGFYSRQLVNKHEKTFLAF D 123 

+ E G+++ + +AI++++ + • H ++LD GCGEG + L AD 
Sbjct: 67 IGECGFFDPLHDAIAELISHPKSGHEAFTILDSGCG3GSHLNALCGFDYAGKAAIGTGID 126 

Query: 124 ISKDSIQIAAKSDQSRLVKWFVSDLANLPIQDSSIDIILDIFSPANYKEFRRVLSDDGIL 183 

+SKD I A+K+ + + W V+D+A P D D++L IFSP+NY EF R+L +DG+I1 
Sbjct: 127 LSKDGILKASKAFKDLM--WAVADVARAPFHDRQFDWLSIFSPSNYAEFHRLLKNDGML 184 

Query: 184 VKWPVAEHVQELREKASQYLKQKDYSKQKILDHFRSNFEIISEQKWQSYNCSQQERQA 243 

+KWP ++++ ELR+ ++ YSN ++ F N ++ QQ 

Sbjct: 185 IKWPRSDYLIELRQFLYTDSPRRTYSNTAAVERFTANAMISRPVRLRYVKTLDQQAIHW 244 

Query: 244 FIDMTPLLFSVDKTTIDW ASISEITVGALIVIGKK 278 

+ MTPL +S K + ++ITV I+IG K 

Sbjct: 245 LLKMTPLAWSAPKDRVSLLKEMKSADITVDVDILIGMK 282 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or d 



25 Example 1839 

A DNA sequence (GBSxl946) was identified in S.agalactiae <SEQ ID 571 1> which encodes the amino 

acid sequence <SEQ ID 5712>. This protein is predicted to be dimethyladenosine transferase (ksgA). 

Analysis of this protein sequence reveals the following: 

Possible site: 61 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3257 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP;CAB11818 GB:Z99104 dimethyladenosine transferase [Bacillus subtilis] 
Identities = 157/284 (55%), Positives = 215/284 (75%), Gaps = 2/284 (0%) 

Query: 3 IADKTVTRAILERHGFTFKKSFGQNFLTDTNILQKIVDTAEIDKGVNVIEIGPGIGALTE 62 

IA T+ IL+++GF+FKKS GQNFL DTNIL +IVD AE+ + VIEIGPGIGALTE 
Sbjct: 5 IATPIRTKEILKKYGFSFKKSLGQNFLIDTNI1JTOIVDHAEVTEKTGVIEIGPGIGALTE 64 

Query: 63 FLAE!NAAEVMAFEIDDRLlPIIiADTLARFDN\ r QVVNQDILKADIjQTQIQA-FKNPDLPIK 121 

LA+ A +V+AFEID RL+PIL DTL+ ++NV V++QD+I1KAD+++ 1+ F++ D I 
Sbjct: 65 QIAKRAKKVVAFEIDQRLLPILKDTLSPYENVTVIHQDVLKADVKSVIEEQFQDCD-EIM 123 



Query: 122 WANLPYYITTPILMHLIESKIPFAEFVVMIQKEVADRISAMPNTKAYGSLSIAVQYYMT 181 

WANLPYY+TTPI+M L+E +P WM+QKEVA+R++A P++K YGSLSIAVQ+Y 
Sbjct: 124 WANLPYYVTTPIIMKLLEEHLPI^GIVVMI^KEVAERMAADPSSKEYGSLSIAVQFYTE 183 

Query: 182 AKVSFIVPRTVFVPAPNVDSAILKMVRRDQPWSVQDEDFFFRVSKVABVHRRKTLWNNL 241 
AK 1VP+TVFVP PNVDSA+++++ RD P V V++E FFF++ K +F RRKTL NNL 
, Sbjct: 184 AKTVMIVPKTVFVPQP1WDSAVIRLILRIX3PAVDVFJESFFFQLIKASFAQRRJCTLLNNL 243 

Query: 242 TSHFGKSEDTKAKLEKALEIAKIKPSIRGEALSIPDFASLADAL 285 

++ + + K+ +E+ LE I RGE+LSI +FA+L++ h 
Sbjct: 244 VNNLPEGKAQKSTIEQVLEETNIDGKRRGESLSIEEFAALSNGL 287 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 571 3> which encodes the amino acid 
sequence <SEQ ID 5714>. Analysis of this protein sequence reveals the following: 
Possible site: 19 

>>> Seems to have no N-terminal signal sequence 

5 

Final Results 

bacterial cytoplasm --- Certainty=0 .2420 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 257/290 (88%) , Positives = 275/290 (94%) 

Query: 1 MRIADKTVTRAILERHGFTFKKSFGQNPLTDTNILQKIVDTAEIDKGVNVIEIGPtSIGAL 60 
15 MRIAD +VT+A+L+RHGFTFKKSFGQNPLTDTNILQKIV3TAEID+ VNVIEIGPGIGAL 

Sbjct: 9 MRIADYSVTKAVLDRHGFTFKKSFGQNFLTDTNILQKIVDTAEIDQNvNVIEIGPGIGAL 68 

Query: 61 TEFLftENAAEVMAFEIDDRLIPILADTLARFDNVQVvNQDILKADLQTQIQAFKNPDLPI 120 
TEFLAENAREVMAFEIDDRL+PILADTIi FDNVQWNQDILICADLQTQI+ FKNPDLPI 
20 Sbjct: 69 TEFLAENAAEVMAFE IDDRLVP I LADTLRDFDNVQWNQD I LKADLQTQI KQFKNPDLPI 128 

Query: 121 KWANLPyYITTPILMHLIESKIPFAEFVvMIQKEVADRISAMPNTKAYGSLSIAVQYYM 180 

KWANLPYYITTPILMHtilESKIPF EFVVM-H2+EVADRISA PNTKAYGSLS I AVQYYM 
Sbjct: 129 lOTVANLPYYITTPILMHLIESKIPFQEFVVMMQREVADRISAEPNTKAYGSLSIAVQYYM 188 

25 

Query: 181 TAKVSFIVPRTVFVPAPNVDSAILKMVRRDQPWSVQDEDFFFRVSKVAFVHRRKTLWNN 240 

TAKV+F1VPRTVFVPAPNVDSAILKMVRRDQP++ V+DEDFFFRVS+++FVHRRKTLWNN 
Sbjct: 189 TAKVAFIVPRWWPAPNVDSAILKMVRRDQPLIEVKDEDFTFRVSRLSFVHRRKTLWNN 248 

'30 Query: 241 LTSHFGKSEDTKAKLEKALEIAKIKPSIRGEALSIPDFAStADALKEVGI 290 

LTSHFGKSED KAKLEK L +A IKPSIRGEALSI DF LADALKEVG+ 
Sbjct: 249 LTSHFGKSEDIKAKLEKGIJilADIKPSIRGEALSIQDFGKIADALKEVGL 298 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1840 

A DNA sequence (GBSxl947) was identified in S.agalactiae <SEQ ID 5715> which encodes the amino 

acid sequence <SEQ ID 5716>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
40 >» Seems to have no N-terminal signal sequence 

Final Results , 

bacterial cytoplasm Certainty=0. 0736 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 

Example 1841 

A DNA sequence (GBSxl948) was identified in S.agalactiae <SEQ ID 5717> which encodes the amino 
acid sequence <SEQ ID 571 8>. Analysis of this protein sequence reveals the following: 
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3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3031 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11817 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 81/179 (45%) , Positives = 117/179 (65%) , Gaps = 4/179 (2%) 

Query: 7 IQEVIWEGKDDTANLRRFyNVDTYETRGSAIDEDDLERIERLHNLRGVIVFTDPDYNGE 66 

Sbjct: 

Query: 67 RIRKIII^AIPTVRHAFLNRDEAKPGSKTKGRSLGVEHASFEDLQKALSKVTQHFDDEDH 126 

+IRK I A+P +HAFL + AKP +K R +GVERAS E ++ L V + + + 
Sbjct: 63 KIRKTISEAVPGCKHAFLPKHLAKPKNK RGIGVEHASVESIRACLENVHEEMEAQPS 119 

Query: 127 FDITQADLIRWGFITASDSRKRREYLGNQLRIGYSNGKQLLKRLRLFGVTKAEVEECME 185 

DI+ DLI G I ++ RRE LG+ L+IGY+NGKQL KRL++F + K++ ++ 
Sbjct: 120 -DISAEDLIHAGLIGGPAAKCRRERLGDLLKIGYTNGKQLQKRLQMFQIKKSDFMSALD 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 571 9> which encodes the amino acid 
sequence <SEQ ID 5720>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1474 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/187 (78%) , Positives = 165/187 (88%) 

Query: 1 mKKIDIQEVIVVEGKDDTANLRRFYNVDTYETRGSAIDEDDLERIERLHNLRGVIVFTD 60 

+ +KI + 1 QEV+ WEGKDDTANLRRFY VDTYETRGSAI E+DLERI RL++LRGVIV TD 
Sbjct: 15 LTEKINIQEVLVVEGKDDTANLRRFYEVDTYETRGSAITEEDLERINRLNDLRGVIVLTD 74 

Query: 61 PDYNGERIRKI IMNAI PTVKlAFIjlSniDEAKPGSKTKGRSLGVEHASFEDLQKALSKVTQH 120 

PDYNGERIRK+IM A+PT RHAFLNR+EA P SK+KGRSLGVEHA+FEDLQKAL+ VTQ 
Sbjct: 75 PDYNGERIRKLIMAAVPTARHAFLNRNFAVPSSKSKGRSLGVEHANFEDLQKALAHVTQQ 134 

Query: 121 FDDEDHFDITQADLIRWGFITASDSRKRREYLGNQLRIGYSNGKQLLKRLRLFGVTKAEV 180 

+DDE +FDI Q DLIR G + ASDSRKRREYLG +LRIGY+NGKQLLKRL LFG+T AEV 
Sbjct: 135 YDDESYFDIRQTDLIRLGLLMASDSRKRREYLGEKLRIGYANGKQLLKRLELFGITLftEV 194 

Query: 181 EECMEGY 187 

EE ME Y 
Sbjct: 195 EEVMETY 201 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1842 

A DNA sequence (GBSxl949) was identified in S.agalactiae <SEQ ID 5721> which encodes the amino 
acid sequence <SEQ ID 5722>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4955 (Af Eirmative) < suco 

5 bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10139> which encodes amino acid sequence <SEQ ID 
10140> was also identified. 

1 0 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11815 GB:Z99104 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 115/254 (45%) , Positives = 172/254 (67%) 





28 


IFDTHTHI.NVENFEGKIDEEINLASELGVTKJ1NWGFDQDTISKSLELSSQYAQVYSTIG 


87 






+FDTH HLN E ++ ++E I A V ++ WGFD+ TI++++E+ +Y +Y+ IG 




Sb j ct : 


2 


LFDTHAHLNAEQYDTDLEEVIERAKAEKVERIVWGFDRPTITRAMEMIEEYDFIYAAIG 


61 






WHPTEAGSYDDNIESMIISHLENPKVIAIiGEIGLDYYWMEDPKDIQIEVFKRQIELSKEY 








WHP +A + + I + KV+A+GE+GLDY+W + PKDIQ EVF+ QI L+KE 




Sbjct: 


62 


WHPVDAIDMTEEDLAMIKELSAHEKWAIGEMGLDYHWDKSPKDIQKEVFRNQIALAKEV 


121 




148 


NLPFWHTRDALEDTYEVIKESGVGPFGGIMHSFSGSLEMAQKFIDLGMMISFSGWTFK 


207 






NLP ++H RDA ED ++KE G GGIMH F+GS E+A++ + + +SF G VTFK 




Sbjct: 


122 


NLPIIIHNRDATEDVVTILKEEGAEAVGGIMHCFTGSAEVARECMKMNFYLSFGGPVTFK 


181 




208 


KALDVQFAARELPLDKILVETDAPYLAPVPKRGRENKTAYTRYVVEKIAELRGITVEEVA 


267 






A +E +E+P D++L+ETD P+L P P RG+ N+ +Y +YV E+IAEL+ +T EE+A 




Sbjct: 


182 


NAKKPKEWKEIPNDRLLIETDCPFLTPHPFRGKRKEPSYVKYVAEQIAELKEMTFEEIA 


241 


Query: 


268 


EATYQNAVRI FRLD 281 








T +NA R+FR++ 




Sb j ct : 


242 


SITTENAKRLFRIN 255 





A related DNA sequence was identified in S.pyogmes <SEQ ID 5723> which encodes the amino acid 
35 sequence <SEQ ID 5724>. Analysis of this protein sequence reveals the following: 
Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 .2817 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 190/258 (73%) , Positives = 227/258 (87%) 



Query: 


24 


+ + IFDTHTHLNV F+G EE+ LA E+GV NWGFDQ TIS +L L+++YA +Y 


83 


Sbj ct: 


38 


EKLTIFDTHTHI^TOffiFCGHETEELTLAQEMGVAYHNVVGFDQATISGALTLANKYANIY 


97 


Query: 


84 


STIGWHPTEAGSYDDNIESMIISHLENPKVIALGEIGLDYYTJMEDPKDIQIEVFKRQIEL 


143 






+TIGWHPTEAGSY + +E I+S L + KVIALGEIGLDYYWMEDPK++QIEVFKRQ++L 




Sbjct: 


98 


ATIGVfflPTEAGSYSEAVEEAIVSQLSHSKVIALGEIGLDTTl'IMEDPKEVQIEVFKRQMQL 


157 


Query: 


144 


SKEYNLPFVVHTRDALEDTYEVIKESGVGPFGGIMHSFSGSLEMAQKFIDLGMMISFSGV 


203 






+K+++LPFWHTRDALEDTYEVIK +GVGP GGIMHS+SGSLEMA++FI+LGMMISFSGV 




Sbjct: 


158 


AKDHDLPFVVHTRDALEDTYEVIKAAG^rePRGGI^lHSYSGSI 1 EMAERFIELG^IMISFSGV 


217 



60 



Query: 204 VTFKKALDVQEAARELPLDKILVETDAPYLAPVPKRGRENKTAYTRYVVEKIAELRGITV 263 

VTFKKALD+QEAA+ LPLDKILVETDAPYL PVPKRG++N TAYTRYW+KIAELRG+TV 
Sbjct: 218 VTFKKALDIQEAAQHLPLDKILVETDAPYLTPVPKRGKQNHTAYTRYVVDKIAELRGMTV 277 
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Query: 264 EEVAEATYQNAWIFRLD 281 

EEVA+AT NA R+F+LD 
Sbjct: 278 EEVAKATTANAKRVFKLD 295 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1843 

A DNA sequence (GBSxl950) was identified in S.agalactiae <SEQ ID 5725> which encodes the amino 
acid sequence <SEQ ID 5726>. This protein is predicted to be endosome-associated protein. Analysis of 
10 this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 5142 (Affirmative) < auco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1844 

A DNA sequence (GBSxl951) was identified in S.agalactiae <SEQ ID 5727> which encodes the amino 
25 acid sequence <SEQ ID 5728>. This protein is predicted to be CGI 7785 gene product. Analysis of this 
protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N-terminal signal sequence 

30 ' Final Results 

bacterial cytoplasm Certainty=0. 4 730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1845 

40 A DNA sequence (GBSxl952) was identified in S.agalactiae <SEQ ID 5729> which encodes the amino 
acid sequence <SEQ ID 5730>. Analysis of this protein sequence reveals the following: 

possible site: 45 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 4032 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB01041 GB:AB022220 gene_id:MLN2 1.14 -unknown protein 
[Arabidopsis thaliana] 
Identities = 49/185 (26%) , Positives = 85/185 (45%) , Gaps = 46/185 (24%) 

Query: 5 LTDLDRVNIAKQEYELGSQLDTLVKIMSQDKvLPIGKVAHVQ DGGKETGEQIYT 58 

L +D V+ + + ELGS+ + +M+ K+ V+ D K+ Q+4 

Sbjct: 154 LEGIDSVDSGRVKIELGSRGLMD^CTMASKLAYENAKl^NLVEFLDCWiroYQKQMSTQVPV 213 

Query: 59 ITPNGTLDKPEDVKEVTVLFKGSTAPFGGDDWKTD WFKNDIPIASKL LLKKFG 111 

T DK +D + + F+G T PF DDW TD W+ ++P KL L+ G 

Sbjct: 214 FT DKQKDANLIVISFRG-TEPFDADDKGTDFDYSWY--EVPKVGKLHMGFLEAMG 265 

Query: 112 --SQSVSHKQGTKQ LEQSAH LLKEVMNKYPNAKISVY 14 S 

Q+ S ++ +K+ +E+SA+ +LK +++++ NA+ V 

Sbjct: 266 LGNRDDTTTFHYl^FEQTSSEEENSKKNLLDMVERSAYYAVRVILKRLLSEHENARFVVT 325 

Query: 147 GHSLG 151 

GHSLG 
Sbjct: 326 GHSLG 330 

No corresponding DNA sequence was identified in S.pyogems. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1846 

A DNA sequence (GBSxl953) was identified in S.agalactiae <SEQ ID 5731> which encodes the amino 
acid sequence <SEQ ID 5732>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have an uncleavable N-term signal seq 

Likelihood = -8.97 Transmembrane 12 - 28 ( 5-33) 



: Certainty=D. 4588 (Affirmative) < suco 



bacterial outside Certainty=0 . 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10141> which encodes amino acid sequence <SEQ ID 
10142> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogems. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8909> and protein <SEQ ID 8910> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrim Score: 14.01 
GvH: Signal Score (-7.5): -5.55 

Possible site: 46 
»> Seems to have an uncleavable N-term signal seg 
ALOM program count: 1 value: -8.97 threshold: 0.0 

INTEGRAL Likelihood = -8.97 Transmembrane 6- 22( 1- 27) 
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* Reasoning Step: 3 

— Final Results --- 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4588 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



SEQ ID 8910 (GBS32) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 10 (lane 2; MW 15.6kDa). 

GBS32-His was purified as shown in Figure 191, lane 8. 
Example 1847 

A DNA sequence (GBSxl954) was identified in S.agalactiae <SEQ ID 5733> which encodes the amino 
acid sequence <SEQ ID 5734>. This protein is predicted to be extramembranal protein (dltD). Analysis of 
this protein sequence reveals the following: 



Possible site: 31 

>>> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood =-10.24 Transmembrane 



28 ( 



- Final Results 

bacterial membrane Certainty=0 . 5097 (Affirmative) 

bacterial outside Certainty=0. 0000 (Not Clear) < , 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 





1 


Sb j ct : 


1 






Sbj ct : 


61 




120 


Sbjct: 


121 




180 


Sbjct: 






240 


Sbj ct : 


241 




300 


Sbjct: 


301 




360 


Sbjct: 


361 



MLKRLGKVFGPLVCALLLLVGLYFVFPVSQ - PHHLGKEKNSAVALTKAGFKSRVQKVRAF 5 9 
MLKRL + GP+ CAL+L+ L +P H+ +EKN AVAL+ + FKS +K+RA 

MLKHLWLILGPVFC^VLVFSLIMFYPAKHLSHNYNEEKNDAVALSPSSFKSTNKKMRAL 6 0 

SDPKANFVPFFGSSEWLRFDAMHPSVLAEAYNRSYIPYLLGQKGAASLTQYYGIQQIKGQ 119 
SD + FVPFFGSSEW R D MHPSVLAE YNRSY PYLLGQKG+ SL+ Y+G+QQI Q 
SDKRHLFVPFFGSSEWQRIDNMHPSVLAERYNRSYRPYLLGQKGSTSLSHYFGMQQIGNQ 120 



IKNKKA+YVISPQWFV KG + AFQ YFS + -I-Q FL NQTG+T DRYAA+RLL 4 



FSY LS+IAS+D + T++NQF I+D FY RIK LK+LKG Q+ -t-Y 4 



QL L + +K T V+FVIPPVN KW +YTGL Q MYQK+VEKIK+QLQSQGF++1ADLS+ 



+G +PYFMQDTIHLGWNGWL DK +NPFL++4- 4-P Y INN FL K WA YTG P +K 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5735> which encodes the amino acid 
sequence <SEQ ID 5736>. Analysis of this protein sequence reveals the following: 
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Possible site: 41 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.06 Transmembrane 7 - 23 ( 1 - 31) 

Final Results 

bacterial membrane Certainty=0. 6222 (Affirmative) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 209/410 (50%) , Positives = 278/410 (66%) 

Query: 1 MLKRLGKVFGPLVCALLLLVGLYFVFPVSQPHHLGKEKNSAVALTKAGFKSRVQKVRAFS 60 

MLKRL + GPL+ A +L+V F FP H + +EK +AVA+T + FK+ + K +A S 
Sbjct: 1 MLKRLWLILGPLLIAFVLWITIFSFPTQLDHSIAQEKANAVAITDSSFKNGLIKRQALS 60 



Query: 121 KNKKAIWISPQWEWKGANKGAFQNYFSNDQTIRFLQNQTGTTYDRYAARRLLKLYPEA 180 

+ KKAI+V+SPQWF +G N A Q Y SN Q I FL ++AA+RLL+L P 

Sbjct: 121 QKKKAIFWSPQWFTAQGINPSAVQMYLSNTQVIEFLLKARTDKESQFAAKRLLELNPGV 180 

Query: 181 SMSDLIEKVADGQKLSNKDKQRLKFNDWVFEKTDAIFSYLPLGKTYNQAIMPHVGKLPKA 240 

S S+L++KV+ G+ LS D+ LK V + +++FS+L Y + I+P V LPK 

Sbjct: 181 SKSNLLKKVSKGKSLSRLDRAILKCQHQVALREESLFSFLGKSTNYEKRILPRVKGLPKV 240 

Query: 241 FSYNHLSRIASQDAKVATRSNQFGIDDRFYQTRIKKHLKKLKGSQRHFNYTKSPBFNDLQ 300 

FSY L+ +A++ ++AT +N+FGI + FY+ RI K Q +++Y SPE+ND Q 

Sbjct: 241 FSYKQLNAIATKRGQLATTNNRFG I KNTF YRKR I AP KYNLYKNFQVNYSYLAS PEYNDFQ 3 00 

Query: 301 LVLNEFSKQNTDVLWIPPWKICWTDYTGLDQKMYQKSVEKIKHQLQSQGFNHIADLSRD 360 

L+L+EF+K+ TDVLFVI PVNK W DYTGL+Q YQ +V KIK QL+SQGF+ IAD S+D 
Sbjct: 301 LLLSEFAKRKTDVLFVITPVNKAWADYTGLNQDKYQAAVRKIKFQLKSQGFHRIADFSKD 360 

Query: 361 GGKPYFMQDTIHLGWNGWLELDKHINPFLTEENSKPNYHINNKFLKKSWA 410 

GG+ YFMQDTIHLGWNGWL DK + PFL + PNY +N F K WA 
Sbjct: 361 GGESYFMQDTIHLGWNGWLAFDKKVQPFLETKQPVPNYKMNPYFYSKIWA 410 

A related GBS gene <SEQ ID 891 1> and protein <SEQ ID 8912> were also identified. Analysis of tJ 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 15.50 
GvH: Signal Score (-7.5): -4.52 

Possible site: 31 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -10.24 threshold: 0.0 

INTEGRAL Likelihood =-10.24 Transmembrane 12 - 28 ( 4 - 31) 
PERIPHERAL Likelihood - 8.33 301 
modified ALOM score: 2.55 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .5097 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

57.5/76.3% over 420aa 

Streptococcus mutans 

GP| 3403204 | unknown Insert characterized 



WO 02/34771 



PCT/GB01/04789 



-2079- 

ORF00336 (301 - 1560 of 1860) 

GP|3403204|gb|AAC29041.l| |AF050517(1 - 421 of 421) unknown {Streptococcus mutans} 
%Match =41.0 

%Identity =57.5 %Similarity =76.2 
5 Matches = 242 Mismatches = 99 Conservative Sub.s = 79 

33 63 93 123 153 1S3 213 243 

FSGFLDLLWFPQPHNK**GVL*WIl^QKY*QLLMTYLWr^FLL* 

10 273 303 333 363 420 450 480 

WMTGTQLIKLLLE*RSS2WLKRLGKVFGPLVCALLLLVGL 

lllll = = lh 111=1= I =1=1= =111 1111= = III =1=11=1 
MLKRLWLILGPVFCALvLVFSLIMFYPAKHLSHNYMEEKNDAVALSPSSFKSTNKKMRflLS 
10 20 30 40 50 60 

15 

510 540 570 600 630 660 690 720 

DPKANFVPFFGSSEWLRFDAMHPSVLAFJVYl^SYIPYLLGQKGAASLTQYYGIQQIKGQIKNKKAIWISPQWFVRKGMJ 

i = iiMiiiiii i i mum urn mmm n = = i = i = iu mmmmimi 11 = 

DKRHLFVPFFGSSEWQRlDNMHPSVLAERYl^SYRPYLLGQKGSTSLSHYFGMQQIGNQIKNKKAVYVISPQWFVPKGTS 
20 80 90 100 110 120 130 140 

750 780 810 840 870 900 930 960 

KGAFQISryFSNDQTIRFLQNQTGTTYDRYAARRLLKLYPEASMSDLIEKVADGQKLSNKDKQRLKFNDWVFEKTDAIFSYL 
III lll= = l II 1111 = 1 11111 = 111 = I = = = =1 = 1 = 1 1= l = = 1= h = = = l 11 = 1 I 

25 PIAFO^YFSSEQLADFLLNQTGSTADRYAAKRLLDIKPSSNLQGMIKKIAAGKTLNSFDRASLRLIKSFLKKEDALFGSL 
160 170 180 190 200 210 220 

990 1020 1050 1080 1110 1140 1170 1200 

PLGKTYNQAIMPHVGKLPKAFSYlfflLSRIASQDAKVATRSNQFGIDDRFYQTRIKKHLKKLKGSQRHFNYTKSPEFNDLQ 

30 : - 1,, iii in i hi I hi = mm hi ii ii mi hi 1 1 mm mmm 

TFSDirYERRVl.PHvTCKLPKHFSYGTLSQIASKDGQRLTKTNQFEINDHFYNKRIKGQLKRLKGFQKQLSYLQSPEVNDLQ 
240 250 260 270 280 290 300 

1230 1260 1290 1320 1350 1380 1410 1440 

3 5 LVLNEFSKQNTDVLFVI PPWKKWTDYTGLDQKMYQKSVEKI KHQLQSQGFNHIADLSRDGGKPYFMQDTIIILGWNGWLE 

i i : = = i i himm ii :im i iiihimhmiiih = imh:i mmmumm 

LALTQIxAKSKTKVIFVIPPVNAKWVEYTGLSQDMYQKTv^ 

320 330 340 350 360 370 380 

40 1470 1500 1530 1560 1590 1620 1650 16B0 

LDKHINPFLTEENSKPNYHINNKFLKKSWAKYTGRPSDYK*IVESDDL*H*SY*SSFLISLYLVILR*LIHVIj*FFIYNE 

=11 =1111=== =1 I III II I II III I =1 

FDKEVNPFIjSKKQLQPAYKINNHFLSKKWATYTGNPFQFK 
400 410 420 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1848 

A DNA sequence (GBSxl955) was identified in S.agalactiae <SEQ ID 5737> which encodes the amino 
50 acid sequence <SEQ ID 5738>. This protein is predicted to be d-alanyl carrier protein (dltC). Analysis of 
this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 1061 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



60 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05776 GB:AF051356 D-alanyl carrier protein [Streptococcus mutans] 
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Identlties = 65/79 (82%) , Positives = 74/79 (93%) 

Query: 1 MDIKSEVLAIIDDLFMEDVSSMMDEDLFDAGVLDSMGTVELIVELESHFDJIDIPIAEFGR 60 

MDIKSEVL IID+LFMEDVS MMDEDLFDAGVLDSMGTVELI VELE+HF+I +P++EFGR 
Sbjct: 1 MDIKSEVLKIIDELFMEDVSDMMDEDLFDAGVLDSMGTVELIVELENHFDITVPVSEFGR 60 

Query: 61 NDWNTANKIVAGVTELCNA 79 

+DWNTANKI+ G+TEL NA 
Sbjct: 61 DDWNTANKI IEGITELRNA 7S 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5739> which encodes the amino acid 
sequence <SEQ ID 5740>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3976 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 57/79 (72%), Positives = 65/79 (82%) 



Query: 1 ^IKSEVIAIIDDLFMEDVSSMMDEDLEI)AGVLDSMGTVELIVELESHFNIDIPIAEFGR 60 
25 M 1+ V+ + D LFMEDVS MMDEDLFDAGVLDS+GTVELIVELES FNI +PI+EFGR 

Sbjct: 1 MSIEETV1ELFDRLFMEDVSEMMDEDLFDAGVLDSLGTVELIVELESTFNIKVPISEFGR 60 

Query: 61 NDWNTANKIVAGVTELCNA 79 
+DWNT KIV GV EL +A 
30' Sbjct: 61 DDWNTVTKIVQGVEELQHA 79 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1849 

35 A DNA sequence (GBSxl956) was identified in S.agalactiae <SEQ ID 5741> which encodes the amino 
acid sequence <SEQ ID 5742>. Analysis of this protein sequence reveals the following: 



Possible site: 16 



45 





have an uncleavable N- 


-rerm signal seq 










INTEGRAL 


Likelihood = -8.55 


Transmembrane 


93 ■ 


- 109 i 


: 91 - 


■ 117) 


INTEGRAL 


Likelihood = -7.64 


Transmembrane 


21 ■ 


- 37 


: is - 


• 39) 


INTEGRAL 


Likelihood = -6.79 


Transmembrane 


390 ■ 


- 406 


( 387 • 


■ 410) 


INTEGRAL 


Likelihood = -5.20 


Transmembrane 


41 • 


- 57 


( 40 ■ 


■ 59) 


INTEGRAL 


Likelihood = -2.07 


Transmembrane 


203 • 


- 219 


( 200 - 


• 221) 


INTEGRAL 


Likelihood = -1.65 


Transmembrane 


65 ■ 


- 81 


( 65 - 


• 81) 


INTEGRAL 


Likelihood = -0.75 


Transmembrane 


125 ■ 


- 141 


( 125 - 


■ 141) 



Certainty=0. 4418 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5743> which encodes the amino acid 
sequence <SEQ ID 5744>. Analysis of this protein sequence reveals the following: 



Likelihood =-10.14 
Likelihood = -9.66 
Likelihood = -5.95 
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Final Results -- — 

bacterial membrane Certainty=D. 5055 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 1 MMMFFSHIPYMEPYGNPIYFVYLILAFLPVIIGIFKQKRLSTYETLVSLVFILFMFGGDH 60 

M+ FF ++P++E YGNP YF Y+ILA LP+ IG+F +KR YE VSL+FI+ M G+ 
Sbjct: 1 MIDFFKNLPHLEAYGNPQYFFYIILAVLPIFIGLFFKKRFPLYEAFVSLIFIVLMLTGEK 60 

Query: 61 YQQLVAFLFYLLWQI I SVFAYQKYRENANSAGVFYIAIAMALFPLIWVKVAPLTGPSSQT 120 

Q+ A FY++WQI V++Y+ YR++ ++ +FYL + M++ PL VK+ P + Q+ 
Sbjct: 61 SHQIFALFFYI IWQI FCVYSYKFYRKSRDNKWI FYLHVFMS ILPLSLVKITPAIWTNQQS 120 

Query: 121 LFSFLGISYLTFKSIGMIIEMRDGTLQEVRLPDFIRFMIFFPTFSSGPIDRFRHFQEDYH 180 

LF FLGI SYLTF+S+GMI +EMRDG L +FIRFM+F PTFSSGPIDRFR F +DY 

Sbjct: 121 LFGFLGISYLTFRSVGMIMEMRDGVLTSFTFWEFIRFMLFMPTFSSGPIDRFRRFNDDYE 180 

Query: 181 KLPEPJDDYFAMLNKAVMYLMLGFLYKHIISYCLGGIIjLPLLENKALMVGGYFNKETILVM 240 

K+P++D+ ML ++V Y+MLGF YK +++ LG ++LP L+ AL GG+EN T+ VM 
Sbjct: 181 KIPDKDELLDMLEQSVHYIMLGFFYKFVLAQILGTMILPGLKEMALQKGGWFNWPTLGVM 240 

Query: 241 YVYGLNLFFDFAGYSMFAIGISYLLGIRTPENFNMPFLSASLKDFWNRWHMSLSFWFRDY 300 

YVYGL+LFFDFAGYSMFAI IS +GI++P NFN PF S LK+FWNRWHMSLSFWFRD+ 
Sbjct: 241 YVYGLDLFFDFAGYSMFAIAISNFMGIKSPTNFNQPFKSQDLKEFMNRWHMSLSFWFRDF 300 

Query: 301 VFMRLWLLIKHKTFKNROTTSGVAYLVKMLVMGFWHGLTWYYIAYGLFHGIGLIINDAW 360 

VFMRLV +L+K+K FKNRNVTS VAY+VNML+MGFWHG+TWYYI YGLFHG+GL++NDAW 
Sbjct: 301 VFMRLVKVTjVTCNKVFKNRNVTSSVAYIVNMLIMGFWHGVTW^ 360 

Query: 361 IRKKKEINRHRKKKGLSPLFQSRAFHVLCIWTFHVVMFSLLLFSGFLNDLWF 413 

+RKKK +N+ RK K LSPL ++ L IV+TF+WM S L+FSGFLNDLWF 

Sbjct: 361 LRKKKRIjNKERKAKNLSPLPENGWTRALGIVITFNVVMLSFLIFSGFLNDLWF 413 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 240/416 (57%) , Positives = 317/416 (75%) , Gaps = 5/416 (1%) 
Sbjct: 

Query: 65 LKSLLAYWGQSLLVFIYKAYRKRFNHTLVFYVTVCLSIFPLFLVKLIPAISEDGHQSLF 124 

L + L Y++ Q + VF Y+ YR+ N VFY+ + +++FPL VK+ P ++ Q+LF 
Sbjct: 64 LVAFLFYLLWQIISVFAYQKYRENANSAGVFYLAIAMALFPLIWVKVAP-LTGPSSQTLF 122 

Query: 125 GFLGISYLTFRAVAMIIEMRDGVLKEFTLWEFLRFLLFFPTFSSGPIDRFKRFNEDYINI 184 

FLGISYLTF+++ MIIEMRDG L+E L +F+RF++FFPTFSSGPIDRF+ F EDY + 
Sbjct: 123 SFLGISYLTFKSIGMIIEMRDGTLQEVRLPDFIRFMIFFPTFSSGPIDRFRHFQEDYHKL 182 

Query: 185 PDRNELLDMLGQAIHYLMLGFLYKFILAYIFGSLIMPPLKELALEQGGVFNWPTLGVMYA 244 

P+R++ ML +A+ YLMLGFLYK I++Y G +++P L+ AL GG FN T+ VMY 
Sbjct: 183 PEPJDDYFAMI^NKAvMYLMLGFLYKHIISYCLGGILLPLLENKALMVGGYFNKETILVMYV 242 

Query: 245 FGFDLFFDFAGYTMFALAISI^LMGIKSPINFDKPFKSRDLKEFWNRWHMSLSFWFRDFVF 304 

+G +LFFDFAGY+MFA+ IS L+GI++P NF+ PF S LK+FWNRWHMSLSFWFRD+VF 
Sbjct: 243 YGLNLFFDFAGYSMFAIGISYLLGIRTPENFNMPFLSASLKDFWNRWHMSLSFWFRDYVF 302 

Query: 305 MRLVKLLVKNKVFKNRNVTSSVAYIIIMLMGFWHGLTWYYIAYGLFHGIGLVINDAWVR 364 

MRLV LL+K+K FKNRNVTS VAY++NML+MGFWHGLTWYYIAYGLFHGIGL+INDAW+R 
Sbjct: 303 I^LVHLLIKHKTFKNRNWSGVAYLVNMLVMGFWHGLTWYYIAYGLFHGIGLIINDAWIR 362 
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' Query: 3S5 KKKNINKERRLAKKPLLP - - ENKWT YALGVF I T FNWMFS FL I FSGFLDLLWFPQP 418 
KKK IN+ R+ ' KK L P +++ + L + +TF+WMFS L+FSGFL+ LWF +P 
Sbjct: 363 KKKE INRHRK- - KKGLSPLFQSI^FHVLCIVOTFHVVMFSIJjLFSGFIilTOLMFNRP 416 

A related GBS gene <SEQ ID 8913> and protein <SEQ ID 8914> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 3.22 . 

10 GvH: Signal Score (-7.5): -4.56 

Possible site: 16 





have an uncleavable N 


term signal seq 










ALOM program 


count: 7 value: 


-8 


55 threshold: 0.0 








INTEGRAL 


Likelihood = -8 


55 


Transmembrane 


93 


109 


91 


117 


INTEGRAL 


Likelihood = -7 


64 


Transmembrane 


21 


37 


19 


39) 


INTEGRAL 


Likelihood = -6 


79 


Transmembrane 


3 90 


406 


387 


410 


INTEGRAL 


Likelihood = -5 


2.0 


Transmembrane 


41 




40 


59 


INTEGRAL 


Likelihood = -2 


07 


Transmembrane 


203 


219 


200 


221 


INTEGRAL 


Likelihood = -1 


65 


Transmembrane 


65 


81 


65 


81 


INTEGRAL 


Likelihood = -0 


75 


Transmembrane 


125 


141 


125 


141 


PERIPHERAI 


Likelihood = 1 


01 


322 











modified ALOM score: 2.21 



* Reasoning Step: 3 

--- Final Results — 

bacterial r 
bacterial outside • 
bacterial cytoplasm - 



-- Certainty=0. 4418 (Affirmative) • 
-- Certainty=0. 0000 (Not Clear) < : 
- Certainty=0 . 0000 (Not Clear) < .' 



The protein has homology with the following sequences in the databases: 

ORF01206(313 - 1563 of 1863) 

GP|2952530|gb|AAC05775.l| |AF05135S(4 - 419 of 420) integral membrane protein {streptococcus 
mutans } 
35 ,'. %Match =50.3 

%Identity =71.0 %Similarity =86.6 . 
Matches = 296 Mismatches = 55 Conservative Sub.s = 65 



TFDTKWEN*YQRSYERGKQVIQAFLEKLPHLDvYGNEQYFFYLIIjAVLPIYIGLFFKKRFALYEIIFSLSFIvMMLTGST 
I:: lllh I I I I M I I h I II I I I I » I I I I I I I I I III II llhllll 
MIDFFKNLPHLEAYGNPQYFFYIIIiAVLPIFIGLFFKKRFPLYEAFVSLIFIVLMLTGEK 



FNQLKSLLAYWGQSLLVFIYKAYRKRFNHTLVFYVTVCLS I FPLFLVKLIPAISEDGHQSLFGFLGISYLTFRAVAMI I 
=|: =|: l = = I = Ml 111 - =11= I =11=11 111= III = =111111111111111=1 11= 
SHQIFALFFYIIWQIFCVYSYKFYRKSRDNKWIFYLHVFMSILPLSLVKITPAIWTN-QQSLFGFLGISYLTFRSVGMIM 



EMRDGVLKEFTLWEFLRFLLFFPTFSSGPIDRFKRFNEDYINIPDRNELLDMLGQAIHYLMLGFLYKFILAYIFGSLIMP 

limn ii = iii = ii = iriiiiiiiiiii = iii=n iii==iiiin i = = ii = im = m = n i = i==i = i 

EMRDGVLTSFTFWEFIRFMLFMPTFSSGPIDRFRRFNDDYEKi PDKD3LLDMLEQSVHYIMLGFFYKFVLAQILGTMILP 



993 1023 1053 1083 1113 1143 1173 1203 

PLKELALEQGGVFNWPTLGV^AFGFDLFFDFAGYTMFAIAISNL^^ 
lll=ll==ll llllllllll =l=lllllllll=lll=llll=llllll ll==llll=lllllllllllllllllll 
60 GLKEMALQKGGWFIWPTLGVMYVYGLDLFFDFAGYSMFAIAISNFMGIKSPTNF^ 

230 240 250 260 270 280 290 

1233 1263 1293 1323 1353 1383 1413 1443 

FVFMRLVKLLVKNKVFKNRNVTSSVAYIINMLLMG^^ 
65 I! I : , = i I : I I i i I I I ; I I i I I ^ I I : = I I , I ! I = IIIII|:|||=HI|:|||| =1111= I 
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PVFMRLVKVLVKNKVFKNRNVTS SVAYI VNML I MGFWHGVTWYY I TYGLFHGVGLVTjNDAWLRKKKRLNKERKAKNIjSPL 
310 320 330 340 350 360 370 

1473 1503 1533 1563 1593 1623 1653 1683 

5 PENKWTYALGVFITFNVVMFSFLIFSGFLDLLWFPQPHNK**GVL*WILNQKY*QLLMTYLWRMFLL*WMKTYLTQEP*T 
ill II III: llllllhllllllllh III H 
PENGWTRALGIVITFNWMLSFLIFSGFLNDLWFADQLSKK 
390 400 410 420 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1850 

A DNA sequence (GBSxl957) was identified in S.agalactiae <SEQ ID 5745> which encodes the amino 

acid sequence <SEQ ID 5746>. Analysis of this protein sequence reveals the following: 

15 Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2611 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10143> which encodes amino acid sequence <SEQ ID 
10144> was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05774 GB:AF051356 D-alanine-D-alanyl carrier protein ligase 
[Streptococcus mutans] 
Identities = 404/510 (79%), Positives = 465/510 (90%) 

IHDMIKTIEHFAETQADFPVYDILGEVHTYGQLKVDSDSLAAHIDSLGLVEKSPVLVFGG 64 
I DMI TIE+FA+ QA+FPVY+ILGE+HTYG+IiK DSDSLAAH+D L L KSPV+VFGG 
IKDMIATIENFAQEQAEFPVYNILGEIHTYGELKADSDSLAAHLDQLDLTAKSPVWFGG 65 

QEYEMLATFVALTKSGHAYIPVDQHSALDRIQAIMTVAQPSLIISIGEFPLEVDNVPILD 124 
QEY MLA+FVALTKSGHAYI P+D HSAL+RI+AI+ VA+PSL+I++ +FP++ VP++ 



SQ+ IF++K Y++ H+VKGDD YYIIFTSGTTG PKGVQISHDNLLSFTNWMI+ + 







Sbjct: 


6 


Query: 


65 


Sbjct: 


66 


Query: 


125 


Sbjct: 


126 


Query: 


185 


Sbjct: 


186 




245 


Sbjct: 


246 


Query: 


305 


Sbjct: 


306 




365 


Sbjct: 


366 




425 


Sbjct: 


426 



F+ P RPQMLAQPPYSFDLSVMYWAPTLA+GGTLFALPX + DFK+LF TIN+LPI ^ 



TSTPSF DMA+LS+DFN++ LP LTHFYFDGEELTVKTA+KLRQRFP+ARIVNAYGPTEA 



TVALSA+A+TD+MIiETCKRLPIGYTK DSPT++IDE GHKL NG+QGEII++GPAVSKGY 



LNNPE+TA AFF+FEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELE+VSQNL 



NKSQY+ SAVAVPRYNKDHKVQNLLAY+VLK+GV + FER LD+TKAIK DL+D+MMDYM 



WO 02/34771 



PCT/GB01/04789 



-2084- 



20 
25 
30 
35 
40 
45 
50 
55 



Query: 485 MPSKFIYREDLPLTPNGKIDIKGLMSEWK 514 

MPSKF+YR+DLPLTPNGKIDIKGLMSEVNK 
Sbjct: 486 MPSKFLYRKDLPLTPNGKIDIKGLMSEVNK 515 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5747> which encodes the amino acid 
sequence <SEQ ID 5748>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane 92 - 108 ( 91 - 108) 
INTEGRAL Likelihood = -0.85 Transmembrane 43 - 59 ( 41 - 59) 



The protein has homology with the following sequences in the databases: 

>GP:AAC05774 GB:AF051356 D-alanine-D-alanyl carrier protein ligase 
[Streptococcus mutans] 
Identities = 365/511 (71%), Positives = 438/511 (85%) 

Query: 2 IKDMIDSIEQFAQTQADFPVYDCLGERRTYGQLKRDSDSIAAFIDSLALLAKSPVLVFGA 61 

IKDMI +IE FAQ QA+FPVY+ LGE TYG+LK DSDS+AA +D L L AKSPV+VFG 
Sbjct: 6 IKDMIATIENFAQEQAEFPVYNILGEIHTYGELKADSDSLAAHLDQLDLTAKSPWVFGG 65 

Query: 62 QTYDMLATFVALTKSGHAYIPVDVHSAPERILAIIEIAKPSLIIAIEEFPLTIEGISLVS 121 

Q Y MLA+FVALTKSGHAYIP+D HSA ERI AI+E+A+PSL+IA+++FP+ + ++ 
Sbjct: 66 QEYAMLASFVALTKSGHAYIPIDHHSALERIEAILEVAEPSLVIAVDDFPIDNLQVPVIQ 125 

Query: 122 LSEIESAKLAEMPYERTHSWGDDNYYIIFTSGTTGQPKGVQISHDNLLSFTNWMIEDAA 181 

S++E ++ Y+ H+VKGDD YYIIFTSGTTG+PKGVQISHDNLLSFTNWMI A 

Sbjct: 126 YSQLEE I FKQKLS YQINHAV KGDDTYYI I FTSGTTGKPKGVQI SHDNLLS FTNWMINAEA 185 

Query: 182 FDVPKQPQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKELVADFKQLFTTIAQLPVGIW 241 

F P +PQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKE+ ADFKQLFTTI QLP+G+W 
Sbjct: 186 FATPHRPQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKEITADFKQLFTTINQLPIGW 245 

Query: 242 TSTPSFADMAMLSDDFCQAKMPALTHFYFDGEELTVSTARECLFERFPSAKIIMAYGPTEA 301 

TSTPSF DMAMLSDDF ++P LTHFYFDGEELTV TA+KL +RFP A+I+NAYGPTEA 
Sbjct: 246 TSTPSFVDMAMLSDDFNAQQLPHLTHFYFDGEELTVKTAKKLRQRFPQARIVNAYGPTEA 305 

Query: 302 TVALSAIEITREMVDNYTRLPIGYPKPDSPTYIIDEDGKELSSGEQGEIIVTGPAVSKGY 361 

TVALSA+ +T +M++ RLPIGY KPDSPT+IIDE G +L++G+QGEI IV+GPAVSKGY 
Sbjct: 306 TW^SALAVTDKMLETCKRLPIGYTKPDSPTFIIDESGHKLAWGQQGEIIVSGPAVSKGY 365 

Query: 362 LNNPEKTAEAFFTFKGQPAYHTGDIGSLTEDNILLYGGRLDFQIKYAGYRIELEDVSQQL 421 

LNNPE+TA AFF F+G PAYHTGD+GS+T++ +LLYGGR+DFQIK+ GYRIELE+VSQ L 
Sbjct: 366 LNNPERTAAAFFEFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEEVSQNL 425 

Query: 422 NQSPMVASAVAVPRYNKEHKVQNLLAYIWKDGVKERFDRELELTKAIKASVKDHMMSYM 481 

N+S +ASAVAVPRYNK+HKVQNLLAY+V+KDGV+E+F+R L++TKAIKA ++D MM YM 
Sbjct: 426 NKSQYIASAVAVPRYNKDHKVQNLIAWVLKDGVEEQFERALDITKAIKADLQDVMMDYM 485 

Query: 482 MPSKFLYRDSLPLTPNGKIDIKTLINEVNNR 512 

MPSKFLYR LPLTPNGKIDIK L++EVN + 
Sbjct: 486 MPSKFLYRKDLPLTPNGKIDIKGLMSEVNKK 516 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 374/510 (73%) , Positives = 439/510 (85%) 

Query: 4 MIHDMIKTIEHFAETQADFPVYDILGEVHTYGQLKVDSDSLAAHIDSLGLVEKSPVLVFG 63 

MI DMI +IE FA+TQADFPVYD LGE TYGQLK DSDS+AA IDSL L+ KSPVLVFG 
Sbjct: 1 MIKDMIDSIEQFAQTQADFPVYDCLGERRTYGQLKRDSDSIAAFIDSLALLAKSPVLVFG 60 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



•- Certainty=0 . 1914 (Affirmative) < suco 
-- Certainty=0 . 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
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Query: 64 GQEYEMLATFVALTKSGHAYIPTOQHSALDRIQAIMTVAQPSLIISIGEFPLEVDNVPIL 123 

Q Y+MLATFVALTKSGHAYIPVD HSA +RI AI+ +A+PSLII+I EFPL ++ + ++ 
Sbjct: 61 AQTYDMLATFVALTKSGHAY1PVDVHSAPERILAIIEIAKPSLIIAIEEFPLTIEGIELV 120 

Query: 124 DVSQVSAIFEEKTPYEVTHSVKGDDNYYIIFTSGTTGLPKGVQISHDNLLSFTNWMISDD 183 

+S++ + + PYE THSVKGDDNYYIIFTSGTTG PKGVQISHDNLLSFTNWMI D 
Sbjct: 121 SLSEIESAKIAEMPYERTHSVKGDDNYYIIFTSGTTGQPKGVQISHDNLLSFTNWMIEDA 180 

10 Query: 184 EFSVPERPQMLAQPPYSFDLSVKmAPTL^MGK-TLFALPKTVVNDFKKLFATINELPIQV 243 

F VP++PQMLAQPPYSFDLSVMYWAPTIA+GGTLFALPK +V DFK+LF TI +LP+ + 
Sbjct: 181 AFDVPKQPQMLAQPPYSFDLSVMYWAPTLALGGTLFALPKELVADFKQLFTTIAQLPVGI 240 



Query: 244 WTSTPSFADMALLSNDFNSETLPQLTHFYFDGEELTOKTAQKLRQRFPKARIVNAYGPTE 303 

WTSTPSFADMA+LS+DF +P LTHFYFDGEELTV TA+KL +RFP A+I+NAYGPTE 
Sbjct: 241 WTSTPSFADMAMLSDDFCQAKMPALTHFYFDGEELTVSTARKLFERFPSAKI INAYGPTE 300 

Query: 304 ATVALSAVAITDEMLETCKRLPIGYTKDDSPTYVIDEEGHKLPNGEQGEIIIAGPAVSKG 363 

ATVALSA+ IT EM+4- RLPIGY K DSPTY+IDE+G +L +GEQGEII+ GPAVSKG 
Sbjct: 301 ATVALSAIEITREMVDNYTRLPIGYPXPDSPTYI IDEDGKELSSGEQGEI IVTGPAVSKG 360 

Query: 364 YLNNPEKTAEAFFQFEGLPAYHTGDLGSMTDEGLLLYGGRMDFQIKFNGYRIELEDVSQN 423 

YLNNPEKTAEAFF F+G PAYHTGD+GS+T++ +LLYGGR+DFQIK+ GYRIELEDVSQ 
Sbjct: 361 YLNNPEKTAEAFFTFKGQPAYHTGDIGSLTEDNILLYGGRLDFQIKYAGYRIELEDVSQQ 420 

Query: 424 LMKSQWKSAVAVPRYNKDHKVQNLLAYIVLKEGVRDDFERDLDLTKAIKEDLKDIMMDY 483 

LN+S V SAVAVPRYNK+HKVQNLLAYIV+K+GV++ F+R+L+LTKAIK +KD MM Y 
Sbjct: 421 LNQSPMVASAVAVPRYNKEHKVQNLliAYIWKDGVKERFDREIiELTKAIKASVKDHMMSY 480 

Query: 484 MMPSKFIYREDLPLTPNGKIDIKGLMSEVN 513 

MMPSKF+YR+ LPLTPNGKIDIK L++EVN 
Sbjct: 481 MMPSKFLYRDSLPLTPNGKIDIKTLINEVN 510 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1851 

A DNA sequence (GBSxl958) was identified in S.agalactiae <SEQ ID 5749> which encodes the amino 
acid sequence <SEQ ID 5750>. This protein is predicted to he a histidine protein kinase (phoR). Analysis of 
this protein sequence reveals the following: 

40 Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.64 Transmembrane 9 - 25 ( 5 - 32) 
INTEGRAL Likelihood =-11.62 Transmembrane 136 - 152 ( 132 - 164) 



- Final Results 

bacterial membrane Certainty=0. 6456 (Affirmative) < 

bacterial outside Certainty=0. 0000 (Not Clear) < £ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < £ 



50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54569 GB:AJ006392 histidine kinase [Streptococcus pneumoniae] 
Identities = 105/416 (25%) , Positives = 197/416 (47%) , Gaps = 56/416 (13%) 

Query: 7 KKFVFLTMSILIVVVLFLFAVSNRYNQYWDEYDAYRIVKLVAKNDY LGIPGDEPIAL 63 

55 + F+F+ + + ++V+ L + NR + + ++ L+A DY L + G I 

Sbjct: 12 RDFIFILILLGFILWTLLLLENRRDNIQLKQWQKVKDLIA-GDYSKVLDMQGGSEITN 70 

Query: 64 VTIDNQKMVKIQSNMTDLTNDVIEKSSLKL LEQGKKSRKWKSFIYSIKE 112 

+T + + ++ LT + +E+ S+L +G + +11 + 

60 Sbjct: 71 ITNNLNDLSEV IRLTQENLEQESKRLNSILFYMTDGVLATNRRGQI IMINDTAKKQ 126 
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Query: 113 ---YKDKTYTIAIMDLASYEVPYARRFLILVFT IFGFCLLAAVSLYLSR 158 

K+ , +I++L EYRLI ' I G L V L R 

Sbjct: 127 LGLVKEDVLNRSILELLKIEENYELRDLITQSPELLLDSQDINGEYLNLRVRFALIRRES 18S 

Query: 159 -FIVGPVE TEMTREKQ FVSDASHELKTPIAAIRANVQVLEQ QIPGNR 204 

FI G V TE +E++ FVS+ SHEL+TP+ 4444 ++ L++ 4 

Sbjct: 187 GFISGLVAVLHDTTEQEKEERERRLFVSNVSHELRTPLTSVKSYLEALDEGALCETVAPD 246 

Query: 205 YLDHWSETKR^FLIEDLLNLSRLDEICRSKVNFKKIjNLSVLCQEVLLTYESLAYEEEKC 264 

44 + ET RM ++ DLL4LSR4D S ++ 4 +N + 4L 44 4 4E44 

Sbjct: 247 FIKVSLDETNRMMRMVTDLLHLSR1DNATSHLDVELINFTAFITFILNRFDICMKGQEKEK 306 

Query: 265 LNDTIED DVWIVGEESQIKQILIILLDNAIRHSLSKSAIQFSLKQARRKAILTISN 320 

4 4 D 4W4 4 44 Q44 4li4NAl44S I 4K 4 IL4IS4 

Sbjct: 307 KYELVRDYPINSIIMEIDTDKMTQWDNILNNAIICYSPDGGKITVRMKTTEDQMILSISD 366 

Query: 321 PSAIYSKEVMDNLFERFYQAKDDHADSLS FGLGLSIAKAIVERHKGRIRAYQE 373 

K+ 4 4F4RFY4 D A S 4 GLGLSIAK I444HKG I A E 
Sbjct: 367 HGLGIPKQDLPRIFDRFYRV--DRARSRAQGGTGLGLSIAKEIIKQHKGFIWAKSE 420 

A related sequence was also identified in GAS <SEQ ID 9131> which encodes the amino acid sequence 
<SEQ ID 9132>. Analysis of this protein sequence reveals the following: 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.30 Transmembrane 9 - 25 { 4-33) 
INTEGRAL Likelihood =-10.35 Transmembrane 161 - 177 ( 154 - 190) 
PERIPHERAL Likelihood = 4.35 142 

Final Results 

bacterial membrane — Certainty=0. 5522 (Affirmative) < suco 

bacterial outside --- Certainty* 0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/406 (23%), Positives = 190/406 (46%), Gaps = 31/406 (7%) 
Query: 1 MFSDLRKKFVFLTMSILIVWLFLFAVSNRYNQYWDEYDAYRIVKLVAKNDYLGIPGDEP 60 

MF4 4R 4F4 4 4 444 4 4 N Y 4 4 Rl4 L44 N 4PG 

Sbjct: 10 MFNRIRIRFIMIASIAIFIILSSIVGIINTARCYQSQQEINRILHLISSNKGK-LPGTTE 68 

Query: 61 IAL VTIDNQKMVKIQS NNTDLTNDVIEKS SLKLLE QGK 98 

4 44 D4 4 S N L4+4 S4L E 4 K 
Sbjct: 69 SSKRLGTKLSEDSLSQFRYYSVIFNANGHLLSSNTANISALDREEAQYFARLFAKSGEEK 128 

Query: 99 KSRKWKSFIYS- -IKEYKDKTYTIAIMDLASYEVPYARRFLILVFTIFG-FCLLAAVSLY 155 

5 4 4 4YS I 4 44 4 I4D Y 4 V FG F 4 
Sbjct: 129 GSYRHQDSVYSYLITQLPNEEKLWILDTTFYFRSVGDLLAVSVMLAFGGFIFFWLVSL 188 

Query: 156 LSRF1VGPVETEMTREKQFVSDASHELKTPIAAIRANVQVLEQQIPGNRYLDHWSETKR 215 

S 44 P 4444F444A HELKTP4A I AN 444E 4 4 4 KR 

Sbjct: 189 FSGWIKPFVQNYEKQRRFITNAGHSLKIPLAIISANNELVELMTGESEWTKSTSDQVKR 248 

Query: 21S ^FLIEDLLNLSRLDEKRSKVNFKKLNLSVLCQEVLLTYESIAYEEEKCLNDTIEDDVWI 275 

4 LI 44 L4RL4E4 V 44 S 4 Q4 44SL 44 K 4 Tl4 44 I 

Sbjct: 249 LTGL1NQMITLARLEEQPDW-LHI4VDFSAIAQDAAEDFKSLVLKDGKRFDLTIQPNIMI 3 07 

Query: 276 VGEESQIKQILIILLDNAIRHSLSKSAIQFSLK- - -QARRKAILTISNPSAIYSKEVMDN 332 

EE 4 444 IL4DNA 44 K 44 SL 4 R44A L 4SN 
Sbjct: 308 KAEEKSLFELVTI LVDNANKYCDPKGLVKVS LIT I GRRRKRAKLEVSNTYLEGKS IDYSR 367 

Query: 333 LFERFYQAKDDH-ADSLSFGLGLSIAKAIVERHKGRIRAYQEKDQL 377 

FERFY4 4 H 4 4G4GLS4A444V4 KG I 4 D 4 
Sbjct: 368 FFERFYREDESHNSKEKGYGIGLSMAESMVKLFKGTITVNYKNDAI 413 
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A related GBS gene <SEQ ID 8915> and protein <SEQ ID 8916> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 17.50 
5 GvH: Signal Score (-7.5): -2.9 

Possible site: 26 
»> Seems to have an uncleavable N-term signal seq 
ALOM program count: 2 value: -13.64 threshold: 0.0 

INTEGRAL Likelihood =-13.64 Transmembrane 9 - 25 ( 5-32) 
10 INTEGRAL Likelihood =-11.62 Transmembrane 136 - 152 ( 132 - 164) 

PERIPHERAL Likelihood = 2.49 345 
modified ALOM score: 3.23 



*** Reasoning Step: 3 

15 

Final Results 

bacterial membrane --- Certainty=0 . 6456 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

20 

The protein has homology with the following sequences in the databases: 

28.3/57.2% over 371aa 

Listeria monocytogenes 

GP | 6117973 | LisK Insert characterized 

25 

ORF0034K631 - 1452 of 1785) 

GP|6117973|gb|AAF03933.l|AP139908_3|AF139908(105 - 476 of 483) LisK {Listeria 
monocytogenes} 
%Match =8.4 
30 %Identity =28.2 %Similarity =57.1 

Matches =79 Mismatches - 113 Conservative Sub.s = 81 



VKLVAKNDYLGIPGDEPIALVTIDNQKMVKIQSNNTDLTM3VIEKSSLKLLEQGKKSRKWKSF 

I : : I I =1 :|= = h > I = I I == I « 

C^IGQMLLNEEEPEVKELLLATTSTLTNQDLTDNEEIKYLFNNDKTVNRKLQDQVINLYDKDGHFINKYYFSRSQDITSI 



DLASYEVPYARRFLI LVFTI FG FCLLAAVSLYLSRFI- - 

h= I I :| = = II I II M = 1 = = I : 

DFSQYFVSGTDKFIMNKPTIDGQKMMTAQMPIV7ADDNTTVIGYAQVVNPLTSYNRMMDRLLVTMILLGAVALFISGMLGY 



: :|| -llll lllllhll: = 
LLAQNFIiNPLTRIjARTMNDIRKNGFQKRIETKTNSRDEIGELTWFM3MMTRIETSFEQQKQFVEDASHELRTPVQIMEG 
210 220 230 240 250 260 270 

918 948 978 1008 1038 1068 1098 

NVQVLEQ - - - QI PG- -NRYLDHWSETKRMEFLIEDLLJ^SRLDEKRSKVNFKKIjNLSVLCQEVLLTYESLAYEEEKC^ 
::::| = 1=1= ::| :||> |::::|:||'| == :: = = : = = l =1 = II 

HLKLLTRWGKDDPAVLDESIJ^LTELERMKK^^ 

290 300 310 320 330 340 350 

1128 1158 1188 1218 1248 1278 1308 1335 

DTIEDDWIVGEESQIKQILIILLDNAIRHSLSKSAIQFSLKQARRKAILTISNPSAIYSKEVMDNLFERFYQA-KDDHA 

: |: : : : : : : | | | | | = : | | 1 = : : | : : :::::::: |:| :| =| |||= I 

ICEDDTDLRALIQHNHLEQILIIIMDNAVKYSGDGTEVDMIWYKEQKQIHIDVRDYGEGISQEEIDKIFNRFYRVDKARSR 
370 380 390 400 ' 410 420 430 

1365 1395 1425 1452 1482 . 1512 1542 1572 

DSLSFGLGLSIAKAIvBRHKGRIRAYQEKDQ-LRLEVQL^IDGFWTNTMIN*RKNDETIFIFYW*OTIILRYFIVTNLLF 
lllhlll :H = I I I 11= : = : II '= 
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EKGGNGLGLAIAKQLVEGYLGTINAVSEPDKGTTIKITBPYIEPKSK 
450 460 470 480 

SEQ ID 5750 (GBS34) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 15 (lane 9; MW 69kDa). 
5 GBS34-GST was purified as shown in Figure 1 93, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1852 

A DNA sequence (GBSxl959) was identified in S.agalactiae <SEQ ID 5753> which encodes the amino 
10 acid sequence <SEQ ID 5754>. This protein is predicted to be two-component response regulator (regX3). 
Analysis of this protein sequence reveals the following: 
Possible site: 30 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm --- Certainty=0. 1986 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04091 GB:AP001508 two-component response regulator [Bacillus halodurans] 
Identities = 98/223 (43%) , Positives = 145/223 (64%) , Gaps = 5/223 (2%) 

Query: 2 RLLVVEDEKSIAEAIQALIADKGYSVDIAPDGDDGLEYILTGLYDLVLLDIMLPKRSGLS SI 
25 R+L++EDEK IA +Q L +GY D AF G DGLE +DLVLLD+MLP+ SQL 

Sbjct: 3 RILIIEDEKKIARVLQLEIjEHEGYETDARFSGSDGLETFQAHAWDLVLLDVMLPELSGLE 62 

Query: 62 vlkrtoeagletpiifltaksqtydkvngldlgaddyitkpfeadellarir--lrtrqs 119 

VL+R+R TPII LTA++ DKV+GLDLGA+DYITKPFE +ELLAR+R LRT Q+ 

30 Sbjct: 63 VLRRIRMTDPVTPlILLTATiNSIPDKVSGLDLGANDYITKPFEIEELLARVRACLRTVQT 122 

Query: 120 SLIRMIQLRLGNIRLNTDSHELESKESSVKIiSlWEFIjLMFA'FMRNAKQIIPICNQLISKVK 179 

+ L + +N + +++ +++L+ KEF L+ F++N Q++ + Q+++ VW 
Sbjct: 123 RERVEDTLMFQELTINEKTRDVQRGNETIELTPKEFELLVFFIKNKGQVLSREQILTNVW 182 

35 

Query: 180 GPSDNSEYNQLEVF I S FLRKKLRFLKAD I E 1 1 CTKGFGYS LEE 222 

G + N ++V++ +LRKKL +A + T +G GY L+E 
Sbjct: 183 GFDYYGDTNVIDVYVRYLRKKLSLTEA LOTVRGVGYRLKE 222 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1853 

A DNA sequence (GBSxl960) was identified in S.agalactiae <SEQ ID 5755> which encodes the amino 
acid sequence <SEQ ID 5756>. This protein is predicted to be 50S ribosomal protein L34-related protein. 
45 Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 5923 (Affirmative) < succ> 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKRTYQPSKIRRQRKHGFRHRMSTKNGRRVIiASRRRKGRKVLSA 44 

MKRT+QPS +4R R HGFR RM+TKNGR+VLA RR KGRK LSA 
Sbjct: 1 MKRTFQPSVLKRSRTHGFRARMATKNGRQVLARRRAKGRKSLSA 44 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5757> which encodes the amino acid 
sequence <SEQ ID 5758>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5385 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 42/44 (95%) , Positives = 44/44 (99%) 

Query: 1 MKRTYQPSKIRRQRKHGFRHRMSTKNGRRVLASRRRKGRKVLSA 44 

+KRTYQPSKIRRQRKHGFRHRMSTKNGRRVLA+RRRKGRKVLSA 
Sbjct: 1 VKRTYQPSKIRRQRKHGFRHRMSTKNGRRVLAARRRKGRKVLSA 44 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 1854 

A DNA sequence (GBSxl961) was identified in S.agalactiae <SEQ ID 5759> which encodes the a 
acid sequence <SEQ ID 5760>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.79 Transmembrane 122 - 138 ( 115 - 141) 
INTEGRAL Likelihood = -4.35 Transmembrane 19 - 35 ( 15 - 40) 

Final Results 

bacterial membrane Certainty=0. 3314 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

?GP:AAF95990 GB:AE004350 conserved hypothetical protein [Vibrio cholerae] 
Identities = 79/145 (54%) , Positives = 117/145 (80%) 

1 MKTFVNNASKTVLSLWFGVMPTIMTVGTIALIISVSTP1FKILGTPFLPFLELLGIPEAD 60 

+++ + + + + FGV+P +M +GTIAL+I+ T +F +LG PF+PFLELLG+PEA 
314 VQSVIGEGIRNAVDMVFGVLPWMGLGTIALVIAEYTSVFSLLGQPFIPFLELLGVPEAT 373 

61 IASQTMIVGFSDMWPSIMAAEIHSEMTRFIVATVSIVQLIYMSETGAVILGSKIPINIL 120 

AS+T++VGF+DM +P+I+AA I +EMTRF++A +S+ QLIYMSE GA++LGS+IP+NI + 
374 AASKTIWGFADMFIPAILAASIDNEMTRFVIAAMSVTQLIYMSEVGALLLGSRIPVNIV 433 

121 ELFI IFIERTI I SLPIIVLMAHLFF 145 

ELF+IFI RT+I+LP+I +AHL F 
434 ELFVIFILRTLITLPVIAAVAHLLF 458 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1855 

A DNA sequence (GBSxl962) was identified in S.agalactiae <SEQ ID 5761> which encodes the amino 
acid sequence <SEQ ID 5762>. This protein is predicted to be D,D-carboxypeptidase (dacA-2). Analysis of 
this protein sequence reveals the following: 

N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2443 (Af f i: 

bacterial membrane Certainty=0 , 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9485> which encodes amino acid sequence <SEQ ID 9486> 
was also identified. A further related GBS nucleic acid sequence <SEQ ID 10945> which encodes amino 
acid sequence <SEQ ID 10946> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA67776 GB:X99400 D, D-carboxypeptidase [Streptococcus pneumoniae] 
Identities = 193/383 (50%), Positives = 282/383 (73%), Gaps = S/383 (1%) 



S+ASN+P+E R YTV++L++A ++SSANSAAIALAE I+G+E FVD M A+L +WGI 



Query: 


1 


Sbjct: 


33 


Query: 


61 


Sbjct: 


93 


Query: 


121 


Sbjct: 


153 


Query: 


181 


Sbjct: 


213 


Query: 


241 


Sbjct: 


273 




301 


Sbjct: 






357 


Sbjct: 


391 



D+ +VN +GLNN LG++IYP S +++ENK+SA D+AIVA +L+ +YP +L+IT K 



+ S NYML MP +R G GDKTGTT+ AG+SF+ T+ E GMR++TV+++AD D 



+ YARFTAT+SL++YI++T+ ++ +G AY+ +A V+DGKE +VIAVA D+ +++ 
NNPYARFTATSSLMDYISSTFTLRKIVQQGDAYQDSKAPVQDGKEDTVIAVAPEDIYLIE 332 

KKNITKQNQLKINF- - - KKELTAPITKKENLGKAYYVDLNfWGKGYLI KE- PSVHLVAKD 355 
+ + Q+ + F K + AP+ +G Y D + +G4GY+ E PS +VA 

R--VGNQSSQSVQFTPDSI<AIPAPIiEAGTWGHLTYEDKDLIGQGYITTERPSFEMVADK 390 

SIERSFFLKVWWNHFVRYWEKL 379 

IE++FFLKVWWN FVR+VNEKL 
KIEKAFFLKVWWNQFVRFVKEKL 413 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5763> which encodes the amino acid 
sequence <SEQ ID 5764>. Analysis of this protein sequence reveals the following: 

Possible site: 21 



>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) • 
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bacterial membrane Certainty=0. 0000 (Not Clear) >; suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 176/380 (46%), Positives = 257/380 (57%), Gaps = 3/380 (0%) 

Query: 1 ^VDLDSGKILYEKDANKP7AAIASLTKIMIVYMVYKEIDNGNLKWTKVNISDYPYQLTR 60 

+AVDL+SGK+LYEKDA 4- +AS++K++T Y+VYKE+ Q L W++ V IS+YPY+LT 
Sbjct: 33 IAVDLESGKVLYEKDAKEWPVASVSKLLTTYLVYKEVSKGKLNWDSPVTISNYPYELTT 92 

Query: 61 ESDASNVPLEKRRYTVKQL VDAAMI S SANSAAI ALAEH I SGTESKFVDKMTAQLEKWGI H 120 

SNVPL+KR+YTVK+L+ A ++++ANS AIALAE I GTE KFVDKM QL +WGI 
Sbjct: 93 NYTISWPLDKRKYTVKELLSALVVNNANSPAIALAEKIGGTEPKFVDKMKKQLRQWGIS 152 

Query: 121 DSHLWASGLNNSMLGNHIYPKSSQNDENKMSARDXAIVAYHLVNEYPSILKITSKSVAK 180 

D+ +VN++GL N LG 4- YP + +DEN A D+AI+A HL+ E+P +LK++SKS 
Sbjct: 153 DAKWNSTGLTraFLGANTYPNTEPDDENCFCATDIAriARHLLLEFPEVLKLSSKSSTI 212 

Query: 181 FDKDIMHSYNYMLPDMPVFRPGITGLKTGTTELAGQSFIATSTESGMRLLTVIMHADKAD 240 

F ++SYNYML MP +R G+ GL G ++ AG SF+ATS E+ MR++TV+++AD++ 
Sbjct: 213 FAGQTIYSYNYMLKGMPCIYREGvDGLFVGYSKKAGASFVATSvENQMRVITVVIiNADQSH 272 

Query: 241 roKYARFTATNSLI^ITNTYEPNLVLAKGAAYKGKEASTODGKEQSVIAVAKNDLKWQ 300 

+D A F TN LL Y+ ++ ++ K V D E++V VA+M L ++ 

Sbjct: 273 EDDLAIFKTTNQLLQYLLINFQKVQLIENNKPV- -KTliYVLDSPEKTVKLVAQNSLFFIK 330 

Query: 301 KKNITKQNQLKINFIO<E-LTAPITKKEISrLGKAYYvDLNKVGKGYLIKEPSvHLVAKDSIE 359 

+ +N 4 I K + AP++K + LG+A D + +G+GYL PS++L+ + +1 
Sbjct: 331 PIHTKTKNTVHITKKSSTM1APLSKGQVLGRATLQDKHLIGQGYLDTPPSINLILQKNIS 390 

Query: 360 RS FFLKVWWNHFVRYVNEKL 3 79 

+SFFLKVWWN FVRYVN L 
Sbjct: 391 KSFELKvWWNRFVRYVNTSL 410 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1856 

A DNA sequence (GBSxl963) was identified in S.agalactiae <SEQ ID 5765> which encodes the amino 
acid sequence <SEQ ID 5766>. This protein is predicted to be penicillin binding protein 4 (pdp4) (dacA-1). 
Analysis of this protein sequence reveals the following: 

, Possible site: 23 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.58 Transmembrane 368 - 384 ( 363 - 394) 



45 Final Results 

bacterial membrane Certainty^ 0.6 031 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA60582 GB:X87104 penicillin binding protein 4 [Staphylococcus 
aureus] 

Identities = 117/333 (35%) , Positives = 188/333 (56%) , Gaps = 8/333 (2%) 

55 Query: 5 IVSFLCILLSLTCVNSVQAEEHKDIMQITREAGY-DVKDINKPKASIVIDNKGHILWEDN 63 

1+ LC+ LS+ + A +Q + GY + +P +++ + G +L4+ N 

Sbjct: 7 IIIILCLTLSIMTPYAQAANSDVTPVQAANQYGYAGLSAAYEPTSAVNVSQTGQLLYQYN 56 



Query: 64 ADLERDPASMSKMFTLYLLFEDIAKGKTSIiNTTVTATETDQAlSKIYEISNNNIHAGVAY 123 
60 D + +PASM+K+ T+YL E + KG+ SL+ TVT T + +S + E+SN ++ G + 
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IDTKtWPASMTKlMTi^LTLEAVWKGQLSLDDTVTMraKEYIMSTLPELSNTKLYPGQVW 


126 


Query: 


124 


PIRELITMTAVPSSNVATIMIANHL8QNNPDAPIKRINETAKKLGMTKTHFYNPSGAVAS 


183 






I +L+ +T SSN A +++A +S+N D F+ +N AK +GM THF NP+GA S 




Sbjct- 


127 


TIADLLQIWSNSSNAAALILAKKVSKCTTSD-FVnLM^^ 


185 


Query: 


184 


AFNGLYS PKEYDNNATNVTTARDLS I LTYHFLKKYPDI LNYTKYPEVKAMTCTPYEETFT 


243 






++P +Y + VTTARD +IL H +K+ P IL++T K + T + T+ 




Sbjct: 


186 


RLR-TFAPTKYKDQERTVTTARDYAILDLHVI KETPKI LDFT KQLAPTTHAVTYY 239 


Query: 


244 


TYNYSTPGAKFGLEGVDGLKTGSSPSAAFNALVTAKRQNTRLITVVLGVGDWSDQDGEYY 


303 






T+N+S GAK L G DGLKTGSS +A +N +T KR R+ V++G GD+ + GE 




Sbjct: 


240 


TFNFSLEGAKMSLPGTDGLKTGSSDTANYNHTITTKRGKFRINQVIMGAGDYKNLGGEKQ 


299 


Query: 


304 


RHPFVNALVEKGFKDAKNISSKTPVLKAVKPKK 336 








R+ NAL+E+ F K + + + + KK 




Sbjct: 


300 


RNMMGNALMERSFDQYKYVKILSKGEQRIKGKK 332 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5767> which encodes the amino acid 
20 sequence <SEQ ID 5768>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-15.18 Transmembrane 371 - 387 ( 364 - 392) 

Final Results 

bacterial membrane Certainty=0. 7071 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA62899 GB:X91786 penicillin-binding protein 4 [Staphylococcus 
aureus] 

Identities = 119/328 (36%) , Positives = 184/328 (55%) , Gaps = 19/328 (5%) 

35 



Query: 


6 


ILTIFTFICF- -SVMPLVHAEDVMDIT RQAGYT-VSEVNRPKSSIWDANSSDIL 


57 






+C S+M D+T Q GY +S P S++ V + + +L 




Sbjct: 


4 


LISIIIILCLTLSIMTPYAQATHSDVTPVQAANQYGYAGLSAAYEPTSAVNV-SQTGQLL 


62 


Query: 


58 


WQDNIDIPRDPASMSKMFTLYILFEELAKGKITI4DTTITATPTDQAIANIYEISNNNIVA 


117 






+Q NID +PASM+K+ T+Y+ E + KG++++D T+T T + ++ + E+SN + 




Sbjct: 


63 


YQYN1DTKWNPASOTKLMTMYLTLEAVNKGQLSLDDTVTMTNKEYIMSTLPELSNTKLYP 


122 


Query: 


118 


GVAYPIRDLITMTAVPSSNAATVMIANYLSMWDASAFIDRVWATAKQLGMTNTHFSNASG 


177 






G + I DL+ +T SSNAA +++A +S N S F+D +N AK +GM NTHF N +G 




Sbjct: 




GQVWTIADLLQIWSNSSNAAALII1AKKVSKN-TSDFVDLMNNIQUCAIGMKNTHFVNPTG 


181 




178 


AAAQAFQGYYNPTKYDLSASNITTARDLSKLLYAFLKKYPEIISFTNKSVVHTMVGTPYE 


237 






A + + PTKY +TTARD + L +K+ P+I+ FT + T+ T 




Sbjct: 


182 


AENSRLR- TFAPTKYKDQERTVTTARDYAILDLHVI KETPKI LDFTKQLAPTTLAVT - - - 


237 




238 


EEFHTYNHSLPDNQFGMKGVDGLKTGSSPSAAFNAMITAKRGKTRLITIVMGVGDWSDQN 


297 






++T+N SL + + G DGLKTGSS +A +N IT KRGK R+ ++MG GD+ + 




Sbjct: 


238 


--YYTFNFSLEGAKMSLPGTDGLKTGSSDTANYNHTITTKRGKFRINQVIMGAGDYKNLG 


295 




298 


GEFYRHPFVNALTEKGF KDSKTLSK 322 








GE R+ NAL E+ F K K LSK 




Sbjct: 


296 


GEKQRNMMGNALMERSFDQYKYVKILSK 323 





60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 226/382 (59%) , Positives = 289/382 (75%) , Gaps = 7/382 (1%) 

Query: 12 LLSLTCVNSVQAEEHKDIMQITREAGYTJVKDINKPKASIVID-NKGHILWEDNADLERDP 70 
+ + C + + +D+M ITR+AGY V ++N+PK+SIV+D N ILW+DN D+ RDP 
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Sbjct: 9 IFTFICFSVMPLVHAEDVMDITRQAGYTVSEVWRPKSSIWDANSSDILWQDNIDIPRDP 68 

Query: 71 ASMSKMFTLYLLFEDLAKGKTSIOTTVTATETDQAISKIYEISNNNIHAGVAYPIRELIT 130 

ASMSKMFTLY+LFE+LAKGK +++TT+TAT TDQAI+ IYEISNNNI AGVAYPIR+LIT 
Sbjct: 69 ASMSKMFTLYILFEELAKGKITMDTTITATPTDQAIANIYEISNNMIVAGVAYPIRDLIT 128 

Query: 131 MTAVPSSMVATIMIANHLSQNNPDAFIKRINETAKKLGMTKTHFYNPSGAVASAFNGLYS 190 

MTAVPSSN AT4-MIAN4LS N+ AFI R+N TAK+LGMT THF N SGA A AF G Y+ 
Sbjct: 129 MTAVPSSNAATVMIANYLSMTDASAFIDRVNATAKQLGMTNTHFSNASGAAAQflFQGYYN 18B 

Query: 191 PKEYDI^ATlWTTARDLSILTYHFLKKYPDIIjiNYTKYPEVKAMVGTPYEETFTTYNYSTP 250 
P +YD +A+N+TTARDLS L Y FLKKYP+I+++T V MVGTPYEE F TYN+S P 

Query: 251 GAKFGLEGVDGLKTGSSPSAAFNALVTAKRQOTELITVVLGVGDWSDQDGEYYRHPFVNA 310 

+FG+4GVDGLKTGSSPSAAFNA++TAKR TRLIT+V+GVGDWSDQ+GE+YRHPFVNA 
Sbjct: 249 DNQFGMKGVDGLKTGSSPSAAFNAMITAKRGKTRLITIA/MGVGDWSDQNGEFYRHPFVNA 308 

Query: 311 LVEKGFKDAKNISSKT-PVLKAVKPKKEVTKTKTKSIQE- -QPQTKEQWWTKTDQF1QSH 367 

L EKHFKD+K +S K L+ + P+ TK +T S Q+ + K+ + + + F+ + 
Sbjct: 309 LTEKGFKDSKTLSKKARQKLEKLVPQ TKKETSSKQQHFKATKKQSYLERVEDFMNHN 365 

Query: 368 FVSILIVLGTIAILCLLAGIVL 389 

+LI L I LL +V+ 
Sbjct: 366 HTFLLICLAIFIITILLLSLW 387 



A related GBS gene <SEQ ID 8917> and protein <SEQ ID 891 8> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
30 McG: Discrim Score: -14.02 

GvH: Signal Score (-7.5): -2.54 

Possible site: 60 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -12.58 threshold: 0.0 
35 INTEGRAL Likelihood =-12.58 Transmembrane 339 - 355 ( 334 - 365) 

PERIPHERAL Likelihood = 1.38 99 
modified ALOM score: 3.02 



*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0 . 6031 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01254(301 - 1386 of 1698) 

EGAD|40430|42591(32 - 419 of 431) penicillin binding protein 4 (pdp4) {Staphyloc 

aureus} GP| 1125682 | emb |CAA60585 . 1 | |X87105 penicillin binding protein 4 {staphylococcus 

aureus} GP j 1125686 j emb j CAA60582 . 1 | |X87104 penicillin binding protein 4 {Staphylococcus 

%Match =17.3 

%Identity =36.3 ^Similarity =59.6 

Matches = 123 Mismatches = 130 Conservative Sub.s = 79 

264 294 324 351 381 411 441 471 

FPLHFIIPDLCKLCAS*RHKDIMQITREaGY-DVKDINKPKASIVIDNKGHIL1'EDNADLERDPASMSKMFTLYLLFEDL 
= 1 = II = :| = = = = h:|:» I I = >||||:|: | = || =| : 
ILCLTLSIMTPYAQAANSDVTPVQAaNQYGYAGLSAAYEPTSAVNVSQTGQLLYQYNIDTKWNPASMTKLMTMYLTLEAV 



501 531 561 591 621 651 681 711 

AKGKTSIOTTVTATETDQAISKIYEISNNNIEmGVAYPIRELITMTAVPSSNVATIMIANHLSQNNPDAFIKRINETAKK 
Ih 11= III I = =1 = 1 = 11 -1=1 =h =1 III I = = = l =hl I |= =| || 
65 NKGQLSLDDTOTMTNKEYIMSTLPELSNTKLYPGQVWTIADLLQIWSNSSNA 
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LGMTKTHFYNPSGAVASAFNGLYSPKEYDNNATNVTTARDLSILTYHFLKKYPDIL^ 
••II III I :| : lllill --II I H: I l|>>|| : I » 1= I : I 

IGMKNTHFWPTGAENSR-LRTFAPTKYKDQERTVTTARDYAILDLHVIKETPKILDFTK QLAPTTHAVTYYTFN 

180 190 200 210 220 230 240 

981 1011 1041 1071 1101 1131 1161 

YSTPGAKFGLEGVDGLKTGSSPSAAFNALVTAKRQNTRLIWVLGVGDWSDQDGEYYRHPFVNALVEKGFKDAK 

= i iii i i iiiiiiii =i u u ii i= mi u= = ii h mm 1 i 

FSLEGAKMSLPGTDGLKTGSSDTAlraraTITTKRGKFRINQVIMGAGDYKNLGGEKQR^IMMGNALMERSFDQYKYVKILS 
260 270 280 290 300 310 320 

1179 1209 1239 1266 

NISSKTPVLKAVKPKKEVTKTICTKSI -QEQPQ 

I : | := :| : | ||: :|:| 
KGEQRINGKKYYVENDLYDVLPSDFSKKDYKLVVEDGKVHADYPREFI^ 

340 350 360 370 380 390 400 

1296 1326 1356 1386 1416 1446 1476 1506 

TKEQWWTKTDQFIQSHFVSILIVLGTIAILCLLAGIVLLIKRSR**LC*YKSPLHQ*HRGFLLSLEIFN*PTEPSIS*EI 
:= I HUM I:: 

LFTIIGGACLVAGLALIVHMIINRLFRKRK 

410 420 430 

SEQ ID 8918 (GBS379) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 5; MW 44kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 3; MW 68.9kDa). 

GBS379-GST was purified as shown in Figure 212, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1857 

A DNA sequence (GBSxl964) was identified in S.agalactiae <SEQ ID 5769> which encodes the amino 
acid sequence <SEQ ID 5770>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4039 (Affirmative) < euco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 

D+GEYK+GFHD 1+ + +GL + ++ E+S K EP+WMLDFRLKSLE F MPM W 
Sbjct: 7 DIGEYKYGFHDKDVSIFRSERGLTKEIVEEISRMKEEPQWMLDFRLKSLEHFYNMPMPQW 66 

Query: 74 GADLSDIDFDDIIYYQKASDKPARDWDDVPEKIKETFERIGIPEAERAYIiAGASAQYESE 133 
G DL+ ++FD+I YY K S++ R WD+VPE+1K+TF+++GIPEAE+ YLAG SAQYESE 

Sbjct: 



Query: 134 VVYHNMKEEYDKLGIVFTDTDSALKEYPELFKKYFAKLVPPTDNKLAALNSAVWSGGTFI 193 

WYHNMKE+ + GIVF DTDSALKE ++F++++AK-H-PPTDNK AAENSAVWSGG+FI 
Sbjct: 127 VVYHNMKEDLEACjGIVFKDTDSALKENEDIFREHWAKVIPPTDNKFAALNSAVWSGGSFI 186 
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Query: 194 WPKGVKVDIPLQTYFRINNENTGQFERTLIITOEGAS\'HYVEGCTAPTYSSNSLHAAIV 253 

YVPKGVKV+ PLQ YFRIN+EN GQFERTLI IVDE ASVHYVEGCTAP Y++NSLH+A+V 
Sbjct: 187 YVPKGVKVETPLOAYFRINSENMGQFERTLIIVDEEASVHY'/EGCTAPVYTTNSLHSAW 246 

Query: 254 EIFALDGAYMRYTTIQNWSD1WYNLVTKRATAKKDATVEWIDGNLGAKTTMKYPSVYLDG 313 

EI G Y RYTTIQNW++NVYNLVTKR +++AT+EWIDGN+G+K TMKYP+ L G 
Sbjct: 247 EIIVKKGGYCRYTTIQNWANNVY^VTKRTVCEENATMEWIDGMIGSKLTMKYPACILKG 306 

Query: 314 EGARGTMLS IAFANKGQHQDTGAKM I HMAPHT3 SS I VS KS I AKGGGKVDYRGQVTFNKDS 373 

EGARG LSIA A KGQHQD GAKMIH AP+TSS+IVSKSI+K GGKV YRG V F + + 
Sbjct: 307 EGARGMTLS IALAGKGQHQDAGAKKI HLAPNTS ST I VS KS I S KQGGKVTYRGI VHFGRKA 366 

Query: 374 KKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEEQLYYLMSRGLSEA 433 

+ + S4-IECDT++MD+ S SDTIP+NEI N ++LEHEAKVSK+SEEQL+YLMSRG+SE 
Sbjct: 367 EGARSNIECDTLlMDNKSTSDTIPYNEILNDNISLEHEAiO/SKVSEEQLFYLMSRGISEE 426 



A related DNA sequence was identified in S.pyogenes <SEQ ID 577 1> which encodes the amino acid 
sequence <SEQ ID 5772>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



- Certainty=0.3780(Affirmati\ 
5 --- Certainty=0. 0000 (Not Clear) ■ 



bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 445/472 (94%), Positives = 461/472 (97%) 







Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sb j ct : 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


301 




361 


Sbjct: 


361 


Query: 


421 


Sbjct: 


421 



SLETFNKMPMQTWGADLSDI+FDDI I YYQKASDKPAR WDDVPEKIKETF+RIGIPEAER 



AYLAGASAQYESEWYHNMK E++KLGI+FTDTDSALKEYP+LFK+YFAKLVPPTDNKLA 



ALNSA WSGGTFIYVPKGVKVDIPLQTYFRINNBRTGQFERTLIIVDEGASVHYVEGCTA 



PTYSSNSLHAAI VE I FALDGAYMRYTT I QNWSDNVYNLVTKRA A DATVEWIDGNLGA 



KTTMKYPSVYLDG GARGTMLSIAFAN GQHQDTGAKM1HNAPHTSSSIVSKSIAK GGK 



VDYRGQVTFNK SKKSVSHIECDTILMDDISKSDTIPFNEIHNSQVALEHEAKVSKISEE 



QIiYYLMSRGLSE+EATEMIVKGnfEPFTKELPMEYAVELNRLISYEMEGSVG 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 

Example 1858 

A DNA sequence (GBSxl965) was identified in S.agalactiae <SEQ ID 5773> which encodes the amino 
acid sequence <SEQ ID 5774>. This protein is predicted to be nitrogen fixation protein (nifU). Analysis of 
this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1078 [Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 4 SKLDNLYMAWADHSKHPHHHGFLEGVEQVQLNNPTCGDVISLSVKFDGNIISDIAFAGN 63 

+ LD LY V+ DH K+P + G L V +NNPTCGD I L++K DG+I+ D F G 

Sbjct: 5 ANLDTLYRQVIMHYKNPIOTKGvLNDSIvvDMNNPTCGDRIRLTMKLDGDIVEDAKFEGE 64 

Query: 64 GCTISTASSSMMTDAVIGKTKEEALQIADWSKMVQGDQNPKQEKLGDAEFLAGVSKFPQ 123 

GC+IS AS+SMMT A+ GK E AL ++ +FS M+QG + LGD E L GVSKFP 

Sbjct: 65 GCSISMASASMMTQAIKGKDIETALSMSKIFSDMMQGKEYDDSIDLGDIEALQGVSKFPA 124 

'. Query: 124 RIKCATLSWNALRKAIERD 142 
RIKCATLSW AL K + ++ 
Sbjct: 125 RIKCATLSWKALEKGVAKE 143 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5775> which encodes the amino acid 
sequence <SEQ ID 5776>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1202 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/146 (78%) , Positives = 133/146 (91%) 

Query: 1 M7ALSKLDNLYMAWADHSKHPHHHGFLEGVEQVQLNNPTCGDVISLSVKFDGNIISDIAF 60 

MALSKL++LYMAWADHSK PHHHG L+GVE VQLNNPTCGDVI SL+ VKFD + I DIAF 
Sbjct: 1 MALSKLNHLY^WADHSKRPHHHGQLDGVEAVQLNNPTCGDVISLTVKFDEDKIEDIAF 60 

Query: 61 AGNGCTISTASSSIWTDAVIGKIKEEALQLADVFSKMVQGDQNPKQEKLGDAEFIAGVSK 120 

AGNGCTISTASSSMMTDAVIGK+KEEAIi LAD+FS+MVQG +NP Q++LG+AE LAGV+K 
Sbjct: 61 AGNGCTISTASSSMMTDAVIGKSKEEAIAIADIF3EMVQGQENPAQKELGEAELLAGVAK 120 

Query: 121 FPQRIKCATLSWNALRKAIERDNQAE 146 

FPQRI KC+TL+WNAL+ +AI +R A+ 
Sbjct: 121 FPQRI KCSTLAWNALKEAI KRSANAQ 146 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1859 

A DNA sequence (GBSxl966) was identified in S.agalactiae <SEQ ID 5777> which encodes the amino 
acid sequence <SEQ ID 5778>. This, protein is predicted to be nitrogen fixation protein (nifS) (bl680). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 43 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2453 (Affirmative) < suco 

10 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15258 GB:Z99120 similar to NifS protein homolog [Bacillus subtilis] 
15 Identities = 240/400 (60%), Positives = 306/400 (76%), Gaps = 5/400 (1%) 





9 


LKQDFPlLNQLVNDEPLIYLDNAaTTQKPNQVLEALRDYYQNDNANVHRGVHTLflERATA 


68 






+++ FPIL+Q VN L+YLD+AAT+QKP V+E L YY N+NVHRGVHTL RAT 




Sbjct: 


6 


IREQFPILHQQVNGHDLVYLDSAATSQKPRAVIETLDKYYNQYNSNVHRGVHTLGTRATD 


65 


Query: 


69 


QYENAREKARQFLNAKLSKEILFTRGTTTGLNfreA-KFAESILERGDEVLISIMEHHSNI 


127 






YE AREK R+F+NAK EI+FTH-GTTT EN VA +A + L+ GDEV+I+ MEHH+NI 




Sbjct: 


66, 


GYEGAREKVRKFINAKSMAEIIFTKGTTTSIiMVALSYARANLKPGDEWITYMEHHANI 


125 




128 


IPWQQACERTGAKLWAYLK-DGSLDLEDFYNKLSSKTKWSLAHISNVLGCVTPVKAIA 








IPWQQA + TGA L Y L+ DG++ LED ++S TK V+++H+SNVLG V P+K +A 




Sbjct: 


126 


I PWQQAVKATGATLKYI PLQEDGTI SLEDVRETVTSNTKIVAVSHVSNVLGTVNP I KEMA 


185 



Query: 187 ERVHQVGAYMWIKAQSAPHMAIDVQDLDCDFFALSGHKMLGPTGIGVLYGKESILDKMP 246 

+ H GA +WDGAQS PHM IDVQDLDCDFFALS HKM GPTG+GVLYGK+++L+ M 
Sbjct: 186 KIAHDNGAVIVVDGAQSTPHMKIDVQDLDCDFFALSSHKMCGPTGVGVLYGKKALLENME 245 

Query: 247 PVEFGGEMIDFVYEQSATWKELPWKFEAGTPNIAGAIAFGEALDYLTDVGMDEIHQYEQS 306 

P EFGGEMIDFV +TWKELPWKFEAGTP IAGAI G A+D+L ++G+DEI ++E 
Sbjct: 24S PAEFGGEMIDFVGDYESTWKELPWKFEAGTPIIAGAIGLGAAIDFLEEIGDDEISRHEHK 3 05 

Query: 307 LVSYVLPKLQAIDGLTIYGPSDAESHVGVIAFNLEGLHPHDVATAMDYEGVAVRAGHHCA 366 

L +Y L + + +DG+T+YGP E G++ FNL+ +HPHDVAT +D EG+AVRAGHHCA 
Sbjct: 306 LAAYALERFRQLDGVTVYGP EERAGLVTFNLDDVHPHDVATVLDAEG I AVRAGHHCA 362 

Query: 367 QPLINHLGIHSAVRASFYFYNTKEDCDKLVDAIQKTKEFF 406 

QPL+ L + + RASFY YNT+E+ DKLV+A-QKTKE+F 
Sbjct: 363 QPLMCTILDVTATARASFYLYNTEEEIDKLVEALQKTKEYF 402 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5779> which encodes the amino acid 
sequence <SEQ ID 5780>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3714 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 293/408 (71%) , Positives = 349/408 (84%) 



Query: 3 LLDSYKLKQDFPILNQLVNDEPLIYLDNAATTQKPNQVLEALRDYYQNDNANVHRGVHTL 62 
LLD+ +KQDF ILNQ VNDEPL+YLDNAATTQKP VLEAL+ YYQ DNANVHRGVHTL 
60 Sbjct: 1 LLDAimiKQDFQILNQQvNDEPLVYLDNAATTQKPALVLEALQSYYQEDNANVHRGVHTL 60 
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Query 
Sbjct 

Sbjct: 
Query: 
Sbjct 

Sbj ct 

Sbjct 
Query 
Sbjct 



63 AERATAQYENAREKARQFIiNAKLSKEILFTRGTTTGLNWVAKFAES I LERGDE VLI S IME 122 

AERAT +YE +R++ F++AK SKE+LFTRGTTT LNWVA+FAE +L DEVLISIME 
61 AERATLKXEASRQQVADFIHAKSSKEVLFTRGTTTSLMWARFAEQVLTPEDEVLISIME 120 

123 HHSMIIPWQQACERTGAKLVYAYLKDGSLDjbEDFYNI<LSSKTKFVSIAHISIWLGCTTPV 182 

HH+NI IPWQQAC++TGA+L1VY YLKDG LD++D NKL++KT+FVSL H+SNVLGC+ P+ 
121 HHANIIPWQQACQKTGARLVYVYLKDGQLD^DU^KLTTKTRWSLTOVSNVLGCINPI 180 

183 KA1AERVHQVGAYMWDGAQSAPHMAIDVQDLDCDFFALSGHKMLGPTGIGVLYGKESIL 242 

K IA+ H GAY+WDGAQS PH4AIDVQDLDCDFFA S HKMLGPTG4GVLYGKE +h 
181 KEIAKLAHAKGAYLVVDGAQSVPHLAIDVQDLDCDFFAFSAHKMLGPTGLGVLYGKEELL 240 

243 DKMPPVEFGGEMIDFVYEQSATWKELPWKFEAGTPNIAGAIAFGEALDYLTDVGMDEIHQ 302 

+++ P+EFGGEMIDFVYEQ ATWKELPWKF3AGTP+IAGAI A+ YL +GM +IH 
241 NQWPLEFGGEMIDFVYEQEATWKELPWKFSAGTPEIAGAIGLSAA.ISYLQRLGMADIHA 300 

303 YEQSLVSYVLPKLQAIDGLTIYGPSDAESHVSVIAFNLEGLHPHDVATAMDYEGVAVRAG 362 

+E L++YVLPKL+AI+GLTIYGPS + G+I+FNL+ LHPHD+ATA+DYEGVAVRAG 
301 HEAELIAYVLPKLEAIEGLTIYGPSQPSARSGLISFNLDDLHPHDLATALDYEGVAVRAG 360 

363 HHCAQPLINHLGIHSAVRASFYFYNTKEDCDKLVDAIQKTKEFFNGTL 410 

HHCAQPL+++LG+ + VRASFY YNTK DCD+LV+AI K KEFFNGTL 
361 HHCAQPLLS YLGVPATVRAS FYI YNTKADCDRLVEAI LKAKEFFNGTL 403 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1860 

A DNA sequence (GBSxl967) was identified in S.agalactiae <SEQ ID 5781> which encodes the amino 
acid sequence <SEQ ID 5782>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1441 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP-BAB07189 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 171/430 (39%) , Positives = 267/430 (61%) , Gaps = 15/430 (3%) 

Query: 1 MSKFAII^FLQAKGEPTWIiQELRLKftFEKIEEIiELPVIERVKFHRWNLG--DGTILENDY 58 

+ KE + +F A+ EP W +++RLK FE +E LELP ++ K WN D + E 
Sbjct: 9 IDKEYVQSFSDARNEPQWFKDIRLKGFELVETLELPKPDKTKITSWNFTNFDHKLPEVSP 68 

Query: 59 TANVPDFTE LGNNPKLVQIGTQTVLEQVPMEliIEKGWFTDFYSALEEIPE 109 

A++ + + LVQ V ++ Li KGV+FTD +A++E + 

Sbjct: 69 VASIDELRDEVKGLIGEASDTQNLLVQRDATVVYSKLDEALKAI^GVIFTDLLTAVKEHGD 128 

Query: 110 VIERYFGK-ARPFEEDRLAAYETAYFNSGAVLYIPDNVEITQPIEGLFYQDSQSKVPFNK 168 

++E+Y+ K A +E+RL AHA N G +Y+P NVEI P++ +F+ D++ FN 
Sbjct: 129 LVEKYYMKDAVKVDENFiTALHAALWGGTFIYVPFJWEIEVPLQSVFWFDTEKAGDFN- 187 

Query: 169 HILblVGKNAKVSYLERFESIGDGTERTSANISVEVIAQAGSQIKFASIDRLGENVTTFI 228 

H++++ N+ ++Y+E + S G +E ANI VEV A A +++ F ++D h VTT++ 
Sbjct: 188 HVI IVAEDNSS ITYVFJSIYASFG- -SEEAVANIWEVFAGANAKVSFGAVDNLAAGVTTYV 245 

Query: 229 SRRGRHSSDATIDWALGVTvINEGN^/ADFDSDLIGDGSHANrLKVVAASSGRQVQGIDTRVT 288 

RR D+ ++WALG MN+GN V++ + L+GD S A+ K V+ G Q Q T++ 

Sbjct: 246 VRRAEVGRDSRVEWALGQMNDGNTVSFjNTTHLLGDNSWAOTKTVSVGRGEQKQNFTTQIF 305 

Query: 289 NYGCNSVGHILQHGVILERGTLTFNGIGHIIKGAKGADAQQESRVLMLSDiffiRSDANPIL 348 



